提交 · dd2151be28b9ed734fc5738ac675ed7e234847e3 · openeuler / raspberrypi-kernel

07 2月, 2014 2 次提交

ACPI / hotplug / PCI: Rework the handling of eject requests · dd2151be

由 Rafael J. Wysocki 提交于 2月 04, 2014

To avoid the need to install a hotplug notify handler for each ACPI
namespace node representing a device and having a matching scan
handler, move the check whether or not the ejection of the given
device is enabled through its scan handler from acpi_hotplug_notify_cb()
to acpi_generic_hotplug_event(). Also, move the execution of
ACPI_OST_SC_EJECT_IN_PROGRESS _OST to acpi_generic_hotplug_event(),
because in acpi_hotplug_notify_cb() or in acpi_eject_store() we really
don't know whether or not the eject is going to be in progress (for
example, acpi_hotplug_execute() may still fail without queuing up the
work item).
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

dd2151be

ACPI / hotplug / PCI: Consolidate ACPIPHP with ACPI core hotplug · 3c2cc7ff

由 Rafael J. Wysocki 提交于 2月 06, 2014

The ACPI-based PCI hotplug (ACPIPHP) code currently attaches its
hotplug context objects directly to ACPI namespace nodes representing
hotplug devices.  However, after recent changes causing struct
acpi_device to be created for every namespace node representing a
device (regardless of its status), that is not necessary any more.
Moreover, it's vulnerable to the theoretical issue that the ACPI
handle passed in the context between handle_hotplug_event() and
hotplug_event_work() may become invalid in the meantime (as a
result of a concurrent table unload).

In principle, this issue might be addressed by adding a non-empty
release handler for ACPIPHP hotplug context objects analogous to
acpi_scan_drop_device(), but that would duplicate the code in that
function and in acpi_device_del_work_fn().  For this reason, it's
better to modify ACPIPHP to attach its device hotplug contexts to
struct device objects representing hotplug devices and make it
use acpi_hotplug_notify_cb() as its notify handler.  At the same
time, acpi_device_hotplug() can be modified to dispatch the new
.hp.event() callback pointing to acpiphp_hotplug_event() from ACPI
device objects associated with PCI devices or use the generic
ACPI device hotplug code for device objects with matching scan
handlers.

This allows the existing code duplication between ACPIPHP and the
ACPI core to be reduced too and makes further ACPI-based device
hotplug consolidation possible.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

3c2cc7ff

06 2月, 2014 14 次提交

ACPI / hotplug / PCI: Define hotplug context lock in the core · e525506f

由 Rafael J. Wysocki 提交于 2月 04, 2014

Subsequent changes will require the ACPI core to acquire the lock
protecting the ACPIPHP hotplug contexts, so move the definition of
the lock to the core and change its name to be more generic.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

e525506f

ACPI / hotplug: Fix potential race in acpi_bus_notify() · 78ea4639

由 Rafael J. Wysocki 提交于 2月 04, 2014

There is a slight possibility for the ACPI device object pointed to
by adev in acpi_hotplug_notify_cb() to become invalid between the
acpi_bus_get_device() that it comes from and the subsequent dereference
of that pointer under get_device(). Namely, if acpi_scan_drop_device()
runs in parallel with acpi_hotplug_notify_cb(), acpi_device_del_work_fn()
queued up by it may delete the device object in question right after
a successful execution of acpi_bus_get_device() in acpi_bus_notify().

An analogous problem is present in acpi_bus_notify() where the device
pointer coming from acpi_bus_get_device() may become invalid before
it subsequent dereference in the "if" block.

To prevent that from happening, introduce a new function,
acpi_bus_get_acpi_device(), working analogously to acpi_bus_get_device()
except that it will grab a reference to the ACPI device object returned
by it and it will do that under the ACPICA's namespace mutex. Then,
make both acpi_hotplug_notify_cb() and acpi_bus_notify() use
acpi_bus_get_acpi_device() instead of acpi_bus_get_device() so as to
ensure that the pointers used by them will not become stale at one
point.

In addition to that, introduce acpi_bus_put_acpi_device() as a wrapper
around put_device() to be used along with acpi_bus_get_acpi_device()
and make the (new) users of the latter use acpi_bus_put_acpi_device()
too.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

78ea4639

ACPICA: Introduce acpi_get_data_full() and rework acpi_get_data() · 7c2e1771

由 Rafael J. Wysocki 提交于 2月 04, 2014

Introduce a new function, acpi_get_data_full(), working in analogy
with acpi_get_data() except that it can execute a callback provided
as its 4th argument right after acpi_ns_get_attached_data() has
returned a success.

That will allow Linux to reference count the object pointed to by
*data before the namespace mutex is released so as to ensure that it
will not be freed going forward until the reference to it acquired
by acpi_get_data_full() is dropped.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

7c2e1771

ACPI / hotplug / PCI: Do not pass ACPI handle to hotplug_event() · d3a1ebb0

由 Rafael J. Wysocki 提交于 2月 04, 2014

Since hotplug_event() can get the ACPI handle needed for debug
printouts from its context argument, there's no need to pass the
handle to it. Moreover, the second argument's type may be changed
to (struct acpiphp_context *), because that's what is always passed
to hotplug_event() as the second argument anyway.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

d3a1ebb0

ACPI / hotplug / PCI: Use acpi_handle_debug() in hotplug_event() · 1d4a5b61

由 Rafael J. Wysocki 提交于 2月 04, 2014

Make hotplug_event() use acpi_handle_debug() instead of an open-coded
debug message printing and clean up the messages printed by it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

1d4a5b61

ACPI / hotplug / PCI: Simplify hotplug_event() · b75cece1

由 Rafael J. Wysocki 提交于 2月 04, 2014

A few lines of code can be cut from hotplug_event() by defining
and initializing the slot variable at the top of the function,
so do that.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

b75cece1

ACPI / hotplug / PCI: Drop crit_sect locking · 661b4064

由 Rafael J. Wysocki 提交于 2月 04, 2014

After recent PCI core changes related to the rescan/remove locking,
the code sections under crit_sect mutexes from ACPIPHP slot objects
are always executed under the general PCI rescan/remove lock.
For this reason, the crit_sect mutexes are simply redundant, so drop
them.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

661b4064

ACPI / hotplug / PCI: Drop acpiphp_bus_add() · b6708fbf

由 Rafael J. Wysocki 提交于 2月 04, 2014

acpiphp_bus_add() is only called from one place, so move the code out
of it into that place and drop it. Also make that code use
func_to_acpi_device() to get the struct acpi_device pointer it needs
instead of calling acpi_bus_get_device() which may be costly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

b6708fbf

ACPI / hotplug / PCI: Store acpi_device pointer in acpiphp_context · bbcbfc0e

由 Rafael J. Wysocki 提交于 2月 04, 2014

After recent modifications of the ACPI core making it create a struct
acpi_device object for every namespace node representing a device
regardless of the current status of that device the ACPIPHP code
can store a struct acpi_device pointer instead of an ACPI handle
in struct acpiphp_context.  This immediately makes it possible to
avoid making potentially costly calls to acpi_bus_get_device() in
two places and allows some more simplifications to be made going
forward.

The reason why that is correct is because ACPIPHP only installs
hotify handlers for namespace nodes that exist when
acpiphp_enumerate_slots() is called for their parent bridge.
That only happens if the parent bridge has an ACPI companion
associated with it, which means that the ACPI namespace scope
in question has been scanned already at that point.  That, in
turn, means that struct acpi_device objects have been created
for all namespace nodes in that scope and pointers to those
objects can be stored directly instead of their ACPI handles.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

bbcbfc0e

ACPI / hotplug / PCI: Rework acpiphp_no_hotplug() · b2118d6a

由 Rafael J. Wysocki 提交于 2月 04, 2014

If a struct acpi_device pointer is passed to acpiphp_no_hotplug()
instead of an ACPI handle, the function won't need to call
acpi_bus_get_device(), which may be costly, any more.  Then,
trim_stale_devices() can call acpiphp_no_hotplug() passing
the struct acpi_device object it already has directly to that
function.

Make those changes and update slot_no_hotplug() accordingly.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

b2118d6a

ACPI / hotplug / PCI: Drop acpiphp_bus_trim() · 4dc3082d

由 Rafael J. Wysocki 提交于 2月 04, 2014

If trim_stale_devices() calls acpi_bus_trim() directly, we can
save a potentially costly acpi_bus_get_device() invocation. After
making that change acpiphp_bus_trim() would only be called from one
place, so move the code from it to that place and drop it.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

4dc3082d

ACPI / hotplug / PCI: Simplify register_slot() · 146fc68a

由 Rafael J. Wysocki 提交于 2月 04, 2014

The err label in register_slot() is only jumped to from one place,
so move the code under the label to that place and drop the label.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

146fc68a

ACPI / hotplug / PCI: Proper kerneldoc comments for enumeration/removal · 454481ad

由 Rafael J. Wysocki 提交于 2月 04, 2014

Add proper kerneldoc comments describing acpiphp_enumerate_slots()
and acpiphp_remove_slots().
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

454481ad

ACPI / hotplug / PCI: Simplify disable_slot() · 1c0c5443

由 Rafael J. Wysocki 提交于 2月 04, 2014

After recent PCI core changes related to the rescan/remove locking,
the ACPIPHP's disable_slot() function is only called under the
general PCI rescan/remove lock, so it doesn't have to use
dev_in_slot() any more to avoid race conditions.  Make it simply
walk the devices on the bus and drop the ones in the slot being
disabled and drop dev_in_slot() which has no more users.

Moreover, to avoid problems described in the changelog of commit
29ed1f29 (PCI: pciehp: Fix null pointer deref when hot-removing
SR-IOV device), make disable_slot() carry out the list walk in
reverse order.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

1c0c5443

04 2月, 2014 5 次提交

ACPI / hotplug / PCI: Fix bridge removal race vs dock events · af9d8adc

由 Rafael J. Wysocki 提交于 2月 03, 2014

If a PCI bridge with an ACPIPHP context attached is removed via
sysfs, the code path executed as a result is the following:

pci_stop_and_remove_bus_device_locked
 pci_remove_bus
  pcibios_remove_bus
   acpi_pci_remove_bus
    acpiphp_remove_slots
     cleanup_bridge
      unregister_hotplug_dock_device (drops dock references to the bridge)
     put_bridge
      free_bridge
       acpiphp_put_context (for each child, under context lock)
        kfree (context)

Now, if a dock event affecting one of the bridge's child devices
occurs (roughly at the same time), it will lead to the following code
path:

acpi_dock_deferred_cb
 dock_notify
  handle_eject_request
   hot_remove_dock_devices
    dock_hotplug_event
     hotplug_event (dereferences context)

That may lead to a kernel crash in hotplug_event() if it is executed
after the last kfree() in the bridge removal code path.

To prevent that from happening, add a wrapper around hotplug_event()
called dock_event() and point the .handler pointer in acpiphp_dock_ops
to it.  Make that wrapper retrieve the device's ACPIPHP context using
acpiphp_get_context() (instead of taking it from the data argument)
under acpiphp_context_lock and check if the parent bridge's
is_going_away flag is set.  If that flag is set, it will return
immediately and if it is not set it will grab a reference to the
device's parent bridge before executing hotplug_event().

Then, in the above scenario, the reference to the parent bridge
held by dock_event() will prevent free_bridge() from being executed
for it until hotplug_event() returns.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

af9d8adc

ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event() · 1b360f44

由 Rafael J. Wysocki 提交于 2月 03, 2014

If a PCI bridge with an ACPIPHP context attached is removed via
sysfs, the code path executed as a result is the following:

pci_stop_and_remove_bus_device_locked
 pci_remove_bus
  pcibios_remove_bus
   acpi_pci_remove_bus
    acpiphp_remove_slots
     cleanup_bridge
     put_bridge
      free_bridge
       acpiphp_put_context (for each child, under context lock)
        kfree (child context)

Now, if a hotplug notify is dispatched for one of the bridge's
children and the timing is such that handle_hotplug_event() for
that notify is executed while free_bridge() above is running,
the get_bridge(context->func.parent) in handle_hotplug_event()
will not really help, because it is too late to prevent the bridge
from going away and the child's context may be freed before
hotplug_event_work() scheduled from handle_hotplug_event()
dereferences the pointer to it passed via the data argument.
That will cause a kernel crash to happpen in hotplug_event_work().

To prevent that from happening, make handle_hotplug_event()
check the is_going_away flag of the function's parent bridge
(under acpiphp_context_lock) and bail out if it's set.  Also,
make cleanup_bridge() set the bridge's is_going_away flag under
acpiphp_context_lock so that it cannot be changed between the
check and the subsequent get_bridge(context->func.parent) in
handle_hotplug_event().

Then, in the above scenario, handle_hotplug_event() will notice
that context->func.parent->is_going_away is already set and it
will exit immediately preventing the crash from happening.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1b360f44

ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock · d42f5da2

由 Rafael J. Wysocki 提交于 2月 03, 2014

Since acpiphp_check_bridge() called by acpiphp_check_host_bridge()
does things that require PCI rescan-remove locking around it,
make acpiphp_check_host_bridge() use that locking.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

d42f5da2

ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event() · f41b3261

由 Rafael J. Wysocki 提交于 2月 03, 2014

Commit 9217a984 (ACPI / hotplug / PCI: Use global PCI rescan-remove
locking) modified ACPIPHP to protect its PCI device removal and addition
code paths from races against sysfs-driven rescan and remove operations
with the help of PCI rescan-remove locking. However, it overlooked the
fact that hotplug_event_work() is not the only caller of hotplug_event()
which may also be called by dock_hotplug_event() and that code path
is missing the PCI rescan-remove locking. This means that, although
the PCI rescan-remove lock is held as appropriate during the handling
of events originating from handle_hotplug_event(), the ACPIPHP's
operations resulting from dock events may still suffer the race
conditions that commit 9217a984 was supposed to eliminate.

To address that problem, move the PCI rescan-remove locking from
hotplug_event_work() to hotplug_event() so that it is used regardless
of the way that function is invoked.

Revamps: 9217a984 (ACPI / hotplug / PCI: Use global PCI rescan-remove locking)
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

f41b3261

ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order · 2d7c1b77

由 Rafael J. Wysocki 提交于 2月 03, 2014

According to the changelog of commit 29ed1f29 (PCI: pciehp: Fix null
pointer deref when hot-removing SR-IOV device) it is unsafe to walk the
bus->devices list of a PCI bus and remove devices from it in direct order,
because that may lead to NULL pointer dereferences related to virtual
functions.

For this reason, change all of the bus->devices list walks in
acpiphp_glue.c during which devices may be removed to be carried out in
reverse order.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>

2d7c1b77

03 2月, 2014 3 次提交

parisc/sti_console: prefer Linux fonts over built-in ROM fonts · 8a10bc9d

由 Helge Deller 提交于 1月 31, 2014

The built-in ROM fonts lack many necessary ASCII characters, which is
why it makes sens to prefer the Linux fonts instead if they are
available.  This makes consoles on STI graphics cards which are not
supported by the stifb driver (e.g. Visualize FXe) looks much nicer.
Signed-off-by: NHelge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v3.13

8a10bc9d

hwmon: Fix SENSORS_TMP102 dependencies to eliminate build errors · 632007e2

由 Jean Delvare 提交于 2月 02, 2014

Similar to what was done for the lm75 driver.

Add depends on THERMAL since that is what provides the
register/unregister functions above, but only if THERMAL_OF was
selected as this is an optional feature of the driver.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Acked-by: NEduardo Valentin <eduardo.valentin@ti.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>

632007e2

hwmon: Fix SENSORS_LM75 dependencies to eliminate build errors · 920130a9

由 Jean Delvare 提交于 2月 02, 2014

Based on an earlier attempt by Randy Dunlap.

Fix SENSORS_LM75 dependencies to eliminate build errors:

drivers/built-in.o: In function `lm75_remove':
lm75.c:(.text+0x12bd8c): undefined reference to `thermal_zone_of_sensor_unregister'
drivers/built-in.o: In function `lm75_probe':
lm75.c:(.text+0x12c123): undefined reference to `thermal_zone_of_sensor_register'

Add depends on THERMAL since that is what provides the
register/unregister functions above, but only if THERMAL_OF was
selected as this is an optional feature of the driver.
Signed-off-by: NJean Delvare <khali@linux-fr.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Acked-by: NEduardo Valentin <eduardo.valentin@ti.com>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>

920130a9

02 2月, 2014 1 次提交

Revert "PCI: Remove from bus_list and release resources in pci_release_dev()" · 04480094

由 Rafael J. Wysocki 提交于 2月 01, 2014

Revert commit ef83b078 "PCI: Remove from bus_list and release
resources in pci_release_dev()" that made some nasty race conditions
become possible.  For example, if a Thunderbolt link is unplugged
and then replugged immediately, the pci_release_dev() resulting from
the hot-remove code path may be racing with the hot-add code path
which after that commit causes various kinds of breakage to happen
(up to and including a hard crash of the whole system).

Moreover, the problem that commit ef83b078 attempted to address
cannot happen any more after commit 8a4c5c32 "PCI: Check parent
kobject in pci_destroy_dev()", because pci_destroy_dev() will now
return immediately if it has already been executed for the given
device.

Note, however, that the invocation of msi_remove_pci_irq_vectors()
removed by commit ef83b078 from pci_free_resources() along with
the other changes made by it is not added back because of subsequent
code changes depending on that modification.

Fixes: ef83b078 (PCI: Remove from bus_list and release resources in pci_release_dev())
Reported-by: NMika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04480094

01 2月, 2014 2 次提交

clocksource: kona: Add basic use of external clock · 50ac2061

由 Tim Kryger 提交于 12月 05, 2013

When an clock is specified in the device tree, enable it and use it to
determine the external clock frequency.
Signed-off-by: NTim Kryger <tim.kryger@linaro.org>
Reviewed-by: NMarkus Mayer <markus.mayer@linaro.org>
Reviewed-by: NMatt Porter <matt.porter@linaro.org>
Reviewed-by: NChristian Daudt <bcm@fixthebug.org>
Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NChristian Daudt <bcm@fixthebug.org>
Signed-off-by: NOlof Johansson <olof@lixom.net>

50ac2061

drivers: bus: fix CCI driver kcalloc call parameters swap · 7c762036

由 Lorenzo Pieralisi 提交于 1月 27, 2014

This patch fixes a bug/typo in the CCI driver kcalloc usage
that inadvertently swapped the parameters order in the
kcalloc call and went unnoticed.
Reported-by: NXia Feng <xiafeng@allwinnertech.com>
Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: NOlof Johansson <olof@lixom.net>

7c762036

31 1月, 2014 13 次提交

drivers: xen: deaggressive selfballoon driver · bc1b0df5

由 Bob Liu 提交于 1月 22, 2014

Current xen-selfballoon driver is too aggressive which may cause OOM be
triggered more often. Eg. this bug reported by James:
https://lkml.org/lkml/2013/11/21/158

There are two mainly reasons:
1) The original goal_page didn't consider some pages used by kernel space, like
slab pages and pages used by device drivers.

2) The balloon driver may not give back memory to guest OS fast enough when the
workload suddenly aquries a lot of physical memory.

In both cases, the guest OS will suffer from memory pressure and OOM may
be triggered.

The fix is make xen-selfballoon driver not that aggressive by adding extra 10%
of total ram pages to goal_page.
It's more valuable to keep the guest system reliable and response faster than
balloon out these 10% pages to XEN.
Signed-off-by: NBob Liu <bob.liu@oracle.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

bc1b0df5

xen/grant-table: Avoid m2p_override during mapping · 08ece5bb

由 Zoltan Kiss 提交于 1月 23, 2014

The grant mapping API does m2p_override unnecessarily: only gntdev needs it,
for blkback and future netback patches it just cause a lock contention, as
those pages never go to userspace. Therefore this series does the following:
- the original functions were renamed to __gnttab_[un]map_refs, with a new
  parameter m2p_override
- based on m2p_override either they follow the original behaviour, or just set
  the private flag and call set_phys_to_machine
- gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with
  m2p_override false
- a new function gnttab_[un]map_refs_userspace provides the old behaviour

It also removes a stray space from page.h and change ret to 0 if
XENFEAT_auto_translated_physmap, as that is the only possible return value
there.

v2:
- move the storing of the old mfn in page->index to gnttab_map_refs
- move the function header update to a separate patch

v3:
- a new approach to retain old behaviour where it needed
- squash the patches into one

v4:
- move out the common bits from m2p* functions, and pass pfn/mfn as parameter
- clear page->private before doing anything with the page, so m2p_find_override
  won't race with this

v5:
- change return value handling in __gnttab_[un]map_refs
- remove a stray space in page.h
- add detail why ret = 0 now at some places

v6:
- don't pass pfn to m2p* functions, just get it locally
Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
Suggested-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

08ece5bb

zram: remove zram->lock in read path and change it with mutex · e46e3315

由 Minchan Kim 提交于 1月 30, 2014

Finally, we separated zram->lock dependency from 32bit stat/ table
handling so there is no reason to use rw_semaphore between read and
write path so this patch removes the lock from read path totally and
changes rw_semaphore with mutex.  So, we could do

old:

  read-read: OK
  read-write: NO
  write-write: NO

Now:

  read-read: OK
  read-write: OK
  write-write: NO

The below data proves mixed workload performs well 11 times and there is
also enhance on write-write path because current rw-semaphore doesn't
support SPIN_ON_OWNER.  It's side effect but anyway good thing for us.

Write-related tests perform better (from 61% to 1058%) but read path has
good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.

  CPU 12
  iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

  ==Initial write                ==Initial write
  records: 10                    records: 10
  avg:  516189.16                avg:  839907.96
  std:   22486.53 (4.36%)        std:   47902.17 (5.70%)
  max:  546970.60                max:  909910.35
  min:  481131.54                min:  751148.38
  ==Rewrite                      ==Rewrite
  records: 10                    records: 10
  avg:  509527.98                avg: 1050156.37
  std:   45799.94 (8.99%)        std:   40695.44 (3.88%)
  max:  611574.27                max: 1111929.26
  min:  443679.95                min:  980409.62
  ==Read                         ==Read
  records: 10                    records: 10
  avg: 4408624.17                avg: 4472546.76
  std:  281152.61 (6.38%)        std:  163662.78 (3.66%)
  max: 4867888.66                max: 4727351.03
  min: 4058347.69                min: 4126520.88
  ==Re-read                      ==Re-read
  records: 10                    records: 10
  avg: 4462147.53                avg: 4363257.75
  std:  283546.11 (6.35%)        std:  247292.63 (5.67%)
  max: 4912894.44                max: 4677241.75
  min: 4131386.50                min: 4035235.84
  ==Reverse Read                 ==Reverse Read
  records: 10                    records: 10
  avg: 4565865.97                avg: 4485818.08
  std:  313395.63 (6.86%)        std:  248470.10 (5.54%)
  max: 5232749.16                max: 4789749.94
  min: 4185809.62                min: 3963081.34
  ==Stride read                  ==Stride read
  records: 10                    records: 10
  avg: 4515981.80                avg: 4418806.01
  std:  211192.32 (4.68%)        std:  212837.97 (4.82%)
  max: 4889287.28                max: 4686967.22
  min: 4210362.00                min: 4083041.84
  ==Random read                  ==Random read
  records: 10                    records: 10
  avg: 4410525.23                avg: 4387093.18
  std:  236693.22 (5.37%)        std:  235285.23 (5.36%)
  max: 4713698.47                max: 4669760.62
  min: 4057163.62                min: 3952002.16
  ==Mixed workload               ==Mixed workload
  records: 10                    records: 10
  avg:  243234.25                avg: 2818677.27
  std:   28505.07 (11.72%)       std:  195569.70 (6.94%)
  max:  288905.23                max: 3126478.11
  min:  212473.16                min: 2484150.69
  ==Random write                 ==Random write
  records: 10                    records: 10
  avg:  555887.07                avg: 1053057.79
  std:   70841.98 (12.74%)       std:   35195.36 (3.34%)
  max:  683188.28                max: 1096125.73
  min:  437299.57                min:  992481.93
  ==Pwrite                       ==Pwrite
  records: 10                    records: 10
  avg:  501745.93                avg:  810363.09
  std:   16373.54 (3.26%)        std:   19245.01 (2.37%)
  max:  518724.52                max:  833359.70
  min:  464208.73                min:  765501.87
  ==Pread                        ==Pread
  records: 10                    records: 10
  avg: 4539894.60                avg: 4457680.58
  std:  197094.66 (4.34%)        std:  188965.60 (4.24%)
  max: 4877170.38                max: 4689905.53
  min: 4226326.03                min: 4095739.72
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e46e3315

zram: remove workqueue for freeing removed pending slot · f614a9f4

由 Minchan Kim 提交于 1月 30, 2014

Commit a0c516cb ("zram: don't grab mutex in zram_slot_free_noity")
introduced free request pending code to avoid scheduling by mutex under
spinlock and it was a mess which made code lenghty and increased
overhead.

Now, we don't need zram->lock any more to free slot so this patch
reverts it and then, tb_lock should protect it.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f614a9f4

zram: introduce zram->tb_lock · 92967471

由 Minchan Kim 提交于 1月 30, 2014

Currently, the zram table is protected by zram->lock but it's rather
coarse-grained lock and it makes hard for scalibility.

Let's use own rwlock instead of depending on zram->lock.  This patch
adds new locking so obviously, it would make slow but this patch is just
prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
which is hurdle about zram scalability.

Final patch in this patchset series will remove the lock from read-path
and change rw_semaphore with mutex in write path.  With bonus, we could
drop pending slot free mess in next patch.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92967471

zram: use atomic operation for stat · deb0bdeb

由 Minchan Kim 提交于 1月 30, 2014

Some of fields in zram->stats are protected by zram->lock which is
rather coarse-grained so let's use atomic operation without explict
locking.

This patch is ready for removing dependency of zram->lock in read path
which is very coarse-grained rw_semaphore.  Of course, this patch adds
new atomic operation so it might make slow but my 12CPU test couldn't
spot any regression.  All gain/lose is marginal within stddev.

  iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0

  ==Initial write                ==Initial write
  records: 50                    records: 50
  avg:  412875.17                avg:  415638.23
  std:   38543.12 (9.34%)        std:   36601.11 (8.81%)
  max:  521262.03                max:  502976.72
  min:  343263.13                min:  351389.12
  ==Rewrite                      ==Rewrite
  records: 50                    records: 50
  avg:  416640.34                avg:  397914.33
  std:   60798.92 (14.59%)       std:   46150.42 (11.60%)
  max:  543057.07                max:  522669.17
  min:  304071.67                min:  316588.77
  ==Read                         ==Read
  records: 50                    records: 50
  avg: 4147338.63                avg: 4070736.51
  std:  179333.25 (4.32%)        std:  223499.89 (5.49%)
  max: 4459295.28                max: 4539514.44
  min: 3753057.53                min: 3444686.31
  ==Re-read                      ==Re-read
  records: 50                    records: 50
  avg: 4096706.71                avg: 4117218.57
  std:  229735.04 (5.61%)        std:  171676.25 (4.17%)
  max: 4430012.09                max: 4459263.94
  min: 2987217.80                min: 3666904.28
  ==Reverse Read                 ==Reverse Read
  records: 50                    records: 50
  avg: 4062763.83                avg: 4078508.32
  std:  186208.46 (4.58%)        std:  172684.34 (4.23%)
  max: 4401358.78                max: 4424757.22
  min: 3381625.00                min: 3679359.94
  ==Stride read                  ==Stride read
  records: 50                    records: 50
  avg: 4094933.49                avg: 4082170.22
  std:  185710.52 (4.54%)        std:  196346.68 (4.81%)
  max: 4478241.25                max: 4460060.97
  min: 3732593.23                min: 3584125.78
  ==Random read                  ==Random read
  records: 50                    records: 50
  avg: 4031070.04                avg: 4074847.49
  std:  192065.51 (4.76%)        std:  206911.33 (5.08%)
  max: 4356931.16                max: 4399442.56
  min: 3481619.62                min: 3548372.44
  ==Mixed workload               ==Mixed workload
  records: 50                    records: 50
  avg:  149925.73                avg:  149675.54
  std:    7701.26 (5.14%)        std:    6902.09 (4.61%)
  max:  191301.56                max:  175162.05
  min:  133566.28                min:  137762.87
  ==Random write                 ==Random write
  records: 50                    records: 50
  avg:  404050.11                avg:  393021.47
  std:   58887.57 (14.57%)       std:   42813.70 (10.89%)
  max:  601798.09                max:  524533.43
  min:  325176.99                min:  313255.34
  ==Pwrite                       ==Pwrite
  records: 50                    records: 50
  avg:  411217.70                avg:  411237.96
  std:   43114.99 (10.48%)       std:   33136.29 (8.06%)
  max:  530766.79                max:  471899.76
  min:  320786.84                min:  317906.94
  ==Pread                        ==Pread
  records: 50                    records: 50
  avg: 4154908.65                avg: 4087121.92
  std:  151272.08 (3.64%)        std:  219505.04 (5.37%)
  max: 4459478.12                max: 4435857.38
  min: 3730512.41                min: 3101101.67
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

deb0bdeb

zram: remove unnecessary free · 874e3cdd

由 Minchan Kim 提交于 1月 30, 2014

Commit a0c516cb ("zram: don't grab mutex in zram_slot_free_noity")
introduced pending zram slot free in zram's write path in case of
missing slot free by memory allocation failure in zram_slot_free_notify
but it is not necessary because we have already freed the slot right
before overwriting.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

874e3cdd

zram: delay pending free request in read path · 9b353db1

由 Minchan Kim 提交于 1月 30, 2014

Sergey reported we don't need to handle pending free request every I/O
so that this patch removes it in read path while we remain it in write
path.

Let's consider below example.

Swap subsystem ask to zram "A" block free by swap_slot_free_notify but
zram had been pended it without real freeing.  Swap subsystem allocates
"A" block for new data but request pended for a long time just handled
and zram blindly free new data on the "A" block.  :(

That's why we couldn't remove handle pending free request right before
zram-write.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b353db1

zram: fix race between reset and flushing pending work · da4a0412

由 Minchan Kim 提交于 1月 30, 2014

Dan and Sergey reported that there is a racy between reset and flushing
of pending work so that it could make oops by freeing zram->meta in
reset while zram_slot_free can access zram->meta if new request is
adding during the race window.

This patch moves flush after taking init_lock so it prevents new request
so that it closes the race.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da4a0412

zram: add copyright · 7bfb3de8

由 Minchan Kim 提交于 1月 30, 2014

Add my copyright to the zram source code which I maintain.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bfb3de8

zram: remove old private project comment · 49061236

由 Minchan Kim 提交于 1月 30, 2014

Remove the old private compcache project address so upcoming patches
should be sent to LKML because we Linux kernel community will take care.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49061236

zram: promote zram from staging · cd67e10a

由 Minchan Kim 提交于 1月 30, 2014

Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now.  Of
course, there are lots of product using zram in real practice.

The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram
which is used as swap, too and recently Android Kitkat start to use zram
for small memory smart phone.  And there was a report Google released
their ChromeOS with zram, too and cyanogenmod have been used zram long
time ago.  And I heard some disto have used zram block device for tmpfs.
In addition, I saw many report from many other peoples.  For example,
Lubuntu start to use it.

The benefit of zram is very clear.  With my experience, one of the
benefit was to remove jitter of video application with backgroud memory
pressure.  It would be effect of efficient memory usage by compression
but more issue is whether swap is there or not in the system.  Recent
mobile platforms have used JAVA so there are many anonymous pages.  But
embedded system normally are reluctant to use eMMC or SDCard as swap
because there is wear-leveling and latency issues so if we do not use
swap, it means we can't reclaim anoymous pages and at last, we could
encounter OOM kill.  :(

Although we have real storage as swap, it was a problem, too.  Because
it sometime ends up making system very unresponsible caused by slow swap
storage performance.

Quote from Luigi on Google
 "Since Chrome OS was mentioned: the main reason why we don't use swap
  to a disk (rotating or SSD) is because it doesn't degrade gracefully
  and leads to a bad interactive experience.  Generally we prefer to
  manage RAM at a higher level, by transparently killing and restarting
  processes.  But we noticed that zram is fast enough to be competitive
  with the latter, and it lets us make more efficient use of the
  available RAM.  " and he announced.
http://www.spinics.net/lists/linux-mm/msg57717.html

Other uses case is to use zram for block device.  Zram is block device
so anyone can format the block device and mount on it so some guys on
the internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html

Let's promote zram and enhance/maintain it instead of removing.
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cd67e10a

zsmalloc: move it under mm · bcf1647d

由 Minchan Kim 提交于 1月 30, 2014

This patch moves zsmalloc under mm directory.

Before that, description will explain why we have needed custom
allocator.

Zsmalloc is a new slab-based memory allocator for storing compressed
pages.  It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.

zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.

zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms.  Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab.  This allows for higher allocation success rate under memory
pressure.

Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE.  With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.

This ability to span pages results in zsmalloc allocations not being
directly addressable by the user.  The user is given an
non-dereferencable handle in response to an allocation request.  That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used.  The mapping is necessary since the
object data may reside in two different noncontigious pages.

The zsmalloc fulfills the allocation needs for zram perfectly

[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NNitin Gupta <ngupta@vflare.org>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bcf1647d