- 09 12月, 2014 6 次提交
-
-
由 Dave Airlie 提交于
This adds fbdev/con support for tiled monitors, so that we only set a mode on the correct half of the monitor, or span the two halves if needed. v2: remove unneeded ERROR, fix | vs || Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Dave Airlie 提交于
This takes the tiling info from the connector and exposes it to userspace, as a blob object in a connector property. The contents of the blob is ABI. v2: add property + function documentation. v3: move property setup from previous patch. add boilerplate + fix long line (Daniel) Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Dave Airlie 提交于
This creates a tile group from DisplayID block, and stores the pieces of parsed info from the DisplayID block into the connector. v2: add missing signoff, add new connector bits to docs. v3: remove some debugging. Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Dave Airlie 提交于
Logical ports are never going to have EDID changes, they are used for the internal ports on MST monitors. We cache the EDIDs from these to save time at MST probe. v2: drop misplace tile property line, meant for other patch. Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Dave Airlie 提交于
A tile group is an identifier shared by a single monitor, DisplayID topology has 8 bytes we can use for this, just use those for now until something else comes up in the future. We assign these to an idr and use the idr to tell userspace what connectors are in the same tile group. DisplayID v1.3 says the serial number must be unique for displays from the same manufacturer. v2: destroy idr (dvdhrm) add docbook (danvet) airlied:- not sure how to make docbook add fns to tile group section. v3: fix missing unlock. Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Dave Airlie 提交于
These are just taken from the DisplayID v1.3 spec, and the DDC spec. v2: use __packed (Jani) Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 05 12月, 2014 1 次提交
-
-
由 Masahiro Yamada 提交于
A typo "header=y" was introduced by commit 7071cf7f ("uapi: add missing network related headers to kbuild"). Signed-off-by: NMasahiro Yamada <yamada.m@jp.panasonic.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 12月, 2014 1 次提交
-
-
由 Christian König 提交于
This patch adds an optional list_head parameter to ttm_eu_reserve_buffers. If specified duplicates in the execbuf list are no longer reported as errors, but moved to this list instead. Reviewed-by: NThomas Hellstrom <thellstrom@vmware.com> Signed-off-by: NChristian König <christian.koenig@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
-
- 27 11月, 2014 6 次提交
-
-
由 Rob Clark 提交于
Signed-off-by: NRob Clark <robdclark@gmail.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Rob Clark 提交于
Add helper macros to iterate the current, or incoming set of planes attached to a crtc. These helpers are only available for drivers converted to use atomic-helpers. Signed-off-by: NRob Clark <robdclark@gmail.com> [danvet: Squash in fixup from Rob to move the planemask iterator to drm_crtc.h and document it. That one is needed by the atomic ioctl so can't be in a helper library.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Rob Clark 提交于
Chasing plane->state->crtc of planes that are *not* part of the same atomic update is racy, making it incredibly awkward (or impossible) to do something simple like iterate over all planes and figure out which ones are attached to a crtc. Solve this by adding a bitmask of currently attached planes in the crtc-state. Note that the transitional helpers do not maintain the plane_mask. But they only support the legacy ioctls, which have sufficient brute-force locking around plane updates that they can continue to loop over all planes to see what is attached to a crtc the old way. Signed-off-by: NRob Clark <robdclark@gmail.com> [danvet: - Drop comments about locking in set_crtc_for_plane since they're a bit misleading - we already should hold lock for the current crtc. - Also WARN_ON if get_state on the old crtc fails since that should have been done already. - Squash in fixup to check get_plane_state return value, reported by Dan Carpenter and acked by Rob Clark.] Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Willem de Bruijn 提交于
TCP timestamping introduced MSG_ERRQUEUE handling for TCP sockets. If the socket is of family AF_INET6, call ipv6_recv_error instead of ip_recv_error. This change is more complex than a single branch due to the loadable ipv6 module. It reuses a pre-existing indirect function call from ping. The ping code is safe to call, because it is part of the core ipv6 module and always present when AF_INET6 sockets are active. Fixes: 4ed2d765 (net-timestamp: TCP timestamping) Signed-off-by: NWillem de Bruijn <willemb@google.com> ---- It may also be worthwhile to add WARN_ON_ONCE(sk->family == AF_INET6) to ip_recv_error. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Lars-Peter Clausen 提交于
The drm_get_edid() function performs direct I2C accesses to read EDID blocks, assuming that the monitor DDC interface is directly connected to the I2C bus. It can't thus be used with HDMI encoders that control the DDC bus and expose EDID blocks through a different interface. Refactor drm_do_get_edid() to take a block read callback function instead of an I2C adapter, and export it for direct use by drivers. As in the general case the DDC bus is accessible by the kernel at the I2C level, drivers must make all reasonable efforts to expose it as an I2C adapter and use drm_get_edid() instead of abusing this function. Signed-off-by: NLars-Peter Clausen <lars@metafoo.de> Signed-off-by: NLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Reviewed-by: NRob Clark <robdclark@gmail.com> Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Laurent Pinchart 提交于
All platforms now instantiate the DU through DT, platform data support isn't needed anymore. Signed-off-by: NLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
-
- 26 11月, 2014 1 次提交
-
-
由 Ard Biesheuvel 提交于
This reverts commit 85c8555f ("KVM: check for !is_zero_pfn() in kvm_is_mmio_pfn()") and renames the function to kvm_is_reserved_pfn. The problem being addressed by the patch above was that some ARM code based the memory mapping attributes of a pfn on the return value of kvm_is_mmio_pfn(), whose name indeed suggests that such pfns should be mapped as device memory. However, kvm_is_mmio_pfn() doesn't do quite what it says on the tin, and the existing non-ARM users were already using it in a way which suggests that its name should probably have been 'kvm_is_reserved_pfn' from the beginning, e.g., whether or not to call get_page/put_page on it etc. This means that returning false for the zero page is a mistake and the patch above should be reverted. Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 11月, 2014 4 次提交
-
-
由 Thierry Reding 提交于
This header file makes use of a bunch of structures declared in the drm_crtc.h header file. Include that to make sure the drm_atomic.h header can be included standalone. Signed-off-by: NThierry Reding <treding@nvidia.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Thierry Reding 提交于
This header uses a bunch of declarations from the drm/drm_crtc.h header, so make sure to include that as well so that drm_atomic_helper.h can be included standalone. Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NThierry Reding <treding@nvidia.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Thierry Reding 提交于
The plane helpers aren't pulled into the DocBook yet, so these weren't noticed. Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NThierry Reding <treding@nvidia.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Thierry Reding 提交于
In most situations it will be useful to have the old state passed to the ->atomic_update() callback. For example if a plane is being disabled the new state's .crtc field will be NULL, but some drivers may rely on this field to program the CRTCs registers. v2: rename variable to old_plane_state and remove redundant comment as suggested by Daniel Vetter, remove an Exynos hunk that doesn't apply to drm-next and add a hunk for pending MSM mdp5 changes Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: NThierry Reding <treding@nvidia.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
- 24 11月, 2014 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This can be set by quirks/drivers to be used by the architecture code that assigns the MSI addresses. We additionally add verification in the core MSI code that the values assigned by the architecture do satisfy the limitation in order to fail gracefully if they don't (ie. the arch hasn't been updated to deal with that quirk yet). Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> Acked-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Tejun Heo 提交于
While decoupling ATOMIC and DEAD flags, f47ad457 ("percpu_ref: decouple switching to percpu mode and reinit") updated __ref_is_percpu() so that it only tests ATOMIC flag to determine whether the ref is in percpu mode or not; however, while DEAD implies ATOMIC, the two flags are set separately during percpu_ref_kill() and if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr value verbatim as the percpu pointer after testing ATOMIC, the pointer may now be contaminated with the DEAD flag. This can be fixed by clearing the flag bits before returning the pointer which was the fix proposed by Shaohua; however, as DEAD implies ATOMIC, we can just test for both flags at once and avoid the explicit masking. Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD are clear before returning @ref->percpu_count_ptr as the percpu pointer. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-and-Reviewed-by: NShaohua Li <shli@kernel.org> Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org Fixes: f47ad457 ("percpu_ref: decouple switching to percpu mode and reinit")
-
- 21 11月, 2014 2 次提交
-
-
由 Jussi Laako 提交于
This patch fixes XMOS DSD sample format to DSD_U32_BE and also adds DSD_U16_BE and DSD_U32_BE sample formats. Signed-off-by: NJussi Laako <jussi@sonarnerd.net> Acked-by: NJurgen Kramer <gtmkramer@xs4all.nl> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
由 Thomas Hellstrom 提交于
It happens on occasion that developers of generic user-space applications abuse the dumb buffer API to get hold of drm buffers that they can both mmap() and use for GPU acceleration, using the assumptions that dumb buffers and buffers available for GPU are a) The same type and can be aribtrarily type-casted. b) fully coherent. This patch makes the most widely used drivers warn nicely when that happens, the next step will be to fail. v2: Move drmP.h changes to drm_gem.h. Fix Radeon dumb mmap breakage. Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com> Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Acked-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 20 11月, 2014 2 次提交
-
-
由 Daniel Vetter 提交于
I guess for hysterical raisins this was meant to be the way to read blob properties. But that's done with the two-stage approach which uses separate blob kms object and the special-purpose get_blob ioctl. Shipping userspace seems to have never relied on this, and the kernel also never put any blob thing onto that property. And nowadays it would blow up, e.g. in drm_property_destroy. Also it makes no sense to return values in an ioctl that only returns metadata about everything. So let's ditch all the internal code for the blob list, rename the list to be unambiguous and sprinkle comments all over the place to explain this peculiar piece of api. v2: Squash in fixup from Rob to remove now unused variables. Cc: Rob Clark <robdclark@gmail.com> Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Reviewed-by: NRob Clark <robdclark@gmail.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Daniel Vetter 提交于
Yet another fallout from not considering DP MST hotplug. With the previous patches we have stable indices, but it might still happen that a connector gets added between when we allocate the array and when we actually add a connector. Especially when we back off due to ww mutex contention or similar issues. So store the sizes of the arrays in struct drm_atomic_state and double check them. We don't really care about races except that we want to use a consistent value, so ACCESS_ONCE is all we need. And if we indeed notice that we'd overrun the array then just give up and restart the entire ioctl. Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com> Reviewed-by: NRob Clark <robdclark@gmail.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 19 11月, 2014 1 次提交
-
-
由 Joe Stringer 提交于
Suggested-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NJoe Stringer <joestringer@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 11月, 2014 3 次提交
-
-
由 Dong Aisheng 提交于
The CAN device drivers can use can_is_canfd_skb() to check if the frame to send is on CAN FD mode or normal CAN mode. Acked-by: NOliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: NDong Aisheng <b29396@freescale.com> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 James Hogan 提交于
Commit 79c6ab50 (clk: divider: add CLK_DIVIDER_READ_ONLY flag) in v3.16 introduced the CLK_DIVIDER_READ_ONLY flag which caused the recalc_rate() and round_rate() clock callbacks to be omitted. However using this flag has the unfortunate side effect of causing the clock recalculation code when a clock rate change is attempted to always treat it as a pass-through clock, i.e. with a fixed divide of 1, which may not be the case. Child clock rates are then recalculated using the wrong parent rate. Therefore instead of dropping the recalc_rate() and round_rate() callbacks, alter clk_divider_bestdiv() to always report the current divider as the best divider so that it is never altered. For me the read only clock was the system clock, which divided the PLL rate by 2, from which both the UART and the SPI clocks were divided. Initial setting of the UART rate set it correctly, but when the SPI clock was set, the other child clocks were miscalculated. The UART clock was recalculated using the PLL rate as the parent rate, resulting in a UART new_rate of double what it should be, and a UART which spewed forth garbage when the rate changes were propagated. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Thomas Abraham <thomas.ab@samsung.com> Cc: Tomasz Figa <t.figa@samsung.com> Cc: Max Schwarz <max.schwarz@online.de> Cc: <stable@vger.kernel.org> # v3.16+ Acked-by: NHaojian Zhuang <haojian.zhuang@gmail.com> Signed-off-by: NMichael Turquette <mturquette@linaro.org>
-
由 Georgi Djakov 提交于
There is a duplication in a clock name for apq8084 platform that causes the following warning: "RBCPR_CLK_SRC" redefined Resolve this by adding a MMSS_ prefix to this clock and making its name coherent with msm8974 platform. Fixes: 2b46cd23 ("clk: qcom: Add APQ8084 Multimedia Clock Controller (MMCC) support") Signed-off-by: NGeorgi Djakov <gdjakov@mm-sol.com> Reviewed-by: NStephen Boyd <sboyd@codeaurora.org> Signed-off-by: NMichael Turquette <mturquette@linaro.org>
-
- 16 11月, 2014 3 次提交
-
-
由 Peter Zijlstra 提交于
While looking over the cpu-timer code I found that we appear to add the delta for the calling task twice, through: cpu_timer_sample_group() thread_group_cputimer() thread_group_cputime() times->sum_exec_runtime += task_sched_runtime(); *sample = cputime.sum_exec_runtime + task_delta_exec(); Which would make the sample run ahead, making the sleep short. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Tejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maxime COQUELIN 提交于
On some 32 bits architectures, including x86, GENMASK(31, 0) returns 0 instead of the expected ~0UL. This is the same on some 64 bits architectures with GENMASK_ULL(63, 0). This is due to an overflow in the shift operand, 1 << 32 for GENMASK, 1 << 64 for GENMASK_ULL. Reported-by: NEric Paire <eric.paire@st.com> Suggested-by: NRasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: NMaxime Coquelin <maxime.coquelin@st.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> # v3.13+ Cc: linux@rasmusvillemoes.dk Cc: gong.chen@linux.intel.com Cc: John Sullivan <jsrhbz@kanargh.force9.co.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Fixes: 10ef6b0d ("bitops: Introduce a more generic BITMASK macro") Link: http://lkml.kernel.org/r/1415267659-10563-1-git-send-email-maxime.coquelin@st.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Cristina Ciocan 提交于
The direction field is set on 7 bits, thus we need to AND it with 0111 111 mask in order to retrieve it, that is 0x7F, not 0xCF as it is now. Fixes: ade7ef7b (staging:iio: Differential channel handling) Signed-off-by: NCristina Ciocan <cristina.ciocan@intel.com> Cc: <Stable@vger.kernel.org> Signed-off-by: NJonathan Cameron <jic23@kernel.org>
-
- 15 11月, 2014 5 次提交
-
-
由 Dave Airlie 提交于
Virtual GPUs would like to give the guest some indication where on the screen the outputs are layed out. So far we only provide modes, these properties could be exposed to userspace so the desktop environment could use them as hints to set the correct offsets. v2: rename properties to be more consistent. Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Boris BREZILLON 提交于
Now that we're using lists instead of kfifo to store drm flip-work tasks we do not need the size parameter passed to drm_flip_work_init function anymore. Moreover this function cannot fail anymore, we can thus remove the return code. Modify drm_flip_work_init users to take account of these changes. [airlied: fixed two unused variable warnings] Signed-off-by: NBoris BREZILLON <boris.brezillon@free-electrons.com> Reviewed-by: NRob Clark <robdclark@gmail.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Boris BREZILLON 提交于
Make use of lists instead of kfifo in order to dynamically allocate task entry when someone require some delayed work, and thus preventing drm_flip_work_queue from directly calling func instead of queuing this call. This allow drm_flip_work_queue to be safely called even within irq handlers. Add new helper functions to allocate a flip work task and queue it when needed. This prevents allocating data within irq context (which might impact the time spent in the irq handler). Signed-off-by: NBoris BREZILLON <boris.brezillon@free-electrons.com> Reviewed-by: NRob Clark <robdclark@gmail.com> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Joe Stringer 提交于
Most NICs that report NETIF_F_GSO_UDP_TUNNEL support VXLAN, and not other UDP-based encapsulation protocols where the format and size of the header differs. This patch implements a generic ndo_gso_check() for VXLAN which will only advertise GSO support when the skb looks like it contains VXLAN (or no UDP tunnelling at all). Implementation shamelessly stolen from Tom Herbert: http://thread.gmane.org/gmane.linux.network/332428/focus=333111Signed-off-by: NJoe Stringer <joestringer@nicira.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vincent BENAYOUN 提交于
There could be a signed overflow in the following code. The expression, (32-logmask) is comprised between 0 and 31 included. It may be equal to 31. In such a case the left shift will produce a signed integer overflow. According to the C99 Standard, this is an undefined behavior. A simple fix is to replace the signed int 1 with the unsigned int 1U. Signed-off-by: NVincent BENAYOUN <vincent.benayoun@trust-in-soft.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 11月, 2014 3 次提交
-
-
由 Chris Wilson 提交于
Currently objects for which the hardware needs a contiguous physical address are allocated a shadow backing storage to satisfy the contraint. This shadow buffer is not wired into the normal obj->pages and so the physical object is incoherent with accesses via the GPU, GTT and CPU. By setting up the appropriate scatter-gather table, we can allow userspace to access the physical object via either a GTT mmaping of or by rendering into the GEM bo. However, keeping the CPU mmap of the shmemfs backing storage coherent with the contiguous shadow is not yet possible. Fortuituously, CPU mmaps of objects requiring physical addresses are not expected to be coherent anyway. This allows the physical constraint of the GEM object to be transparent to userspace and allow it to efficiently render into or update them via the GTT and GPU. v2: Fix leak of pci handle spotted by Ville v3: Remove the now duplicate call to detach_phys_object during free. v4: Wait for rendering before pwrite. As this patch makes it possible to render into the phys object, we should make it correct as well! Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: NRodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
-
由 Tang Chen 提交于
In free_area_init_core(), zone->managed_pages is set to an approximate value for lowmem, and will be adjusted when the bootmem allocator frees pages into the buddy system. But free_area_init_core() is also called by hotadd_new_pgdat() when hot-adding memory. As a result, zone->managed_pages of the newly added node's pgdat is set to an approximate value in the very beginning. Even if the memory on that node has node been onlined, /sys/device/system/node/nodeXXX/meminfo has wrong value: hot-add node2 (memory not onlined) cat /sys/device/system/node/node2/meminfo Node 2 MemTotal: 33554432 kB Node 2 MemFree: 0 kB Node 2 MemUsed: 33554432 kB Node 2 Active: 0 kB This patch fixes this problem by reset node managed pages to 0 after hot-adding a new node. 1. Move reset_managed_pages_done from reset_node_managed_pages() to reset_all_zones_managed_pages() 2. Make reset_node_managed_pages() non-static 3. Call reset_node_managed_pages() in hotadd_new_pgdat() after pgdat is initialized Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com> Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [3.16+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
Before describing bugs itself, I first explain definition of freepage. 1. pages on buddy list are counted as freepage. 2. pages on isolate migratetype buddy list are *not* counted as freepage. 3. pages on cma buddy list are counted as CMA freepage, too. Now, I describe problems and related patch. Patch 1: There is race conditions on getting pageblock migratetype that it results in misplacement of freepages on buddy list, incorrect freepage count and un-availability of freepage. Patch 2: Freepages on pcp list could have stale cached information to determine migratetype of buddy list to go. This causes misplacement of freepages on buddy list and incorrect freepage count. Patch 4: Merging between freepages on different migratetype of pageblocks will cause freepages accouting problem. This patch fixes it. Without patchset [3], above problem doesn't happens on my CMA allocation test, because CMA reserved pages aren't used at all. So there is no chance for above race. With patchset [3], I did simple CMA allocation test and get below result: - Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation - run kernel build (make -j16) on background - 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval - Result: more than 5000 freepage count are missed With patchset [3] and this patchset, I found that no freepage count are missed so that I conclude that problems are solved. On my simple memory offlining test, these problems also occur on that environment, too. This patch (of 4): There are two paths to reach core free function of buddy allocator, __free_one_page(), one is free_one_page()->__free_one_page() and the other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page(). Each paths has race condition causing serious problems. At first, this patch is focused on first type of freepath. And then, following patch will solve the problem in second type of freepath. In the first type of freepath, we got migratetype of freeing page without holding the zone lock, so it could be racy. There are two cases of this race. 1. pages are added to isolate buddy list after restoring orignal migratetype CPU1 CPU2 get migratetype => return MIGRATE_ISOLATE call free_one_page() with MIGRATE_ISOLATE grab the zone lock unisolate pageblock release the zone lock grab the zone lock call __free_one_page() with MIGRATE_ISOLATE freepage go into isolate buddy list, although pageblock is already unisolated This may cause two problems. One is that we can't use this page anymore until next isolation attempt of this pageblock, because freepage is on isolate buddy list. The other is that freepage accouting could be wrong due to merging between different buddy list. Freepages on isolate buddy list aren't counted as freepage, but ones on normal buddy list are counted as freepage. If merge happens, buddy freepage on normal buddy list is inevitably moved to isolate buddy list without any consideration of freepage accouting so it could be incorrect. 2. pages are added to normal buddy list while pageblock is isolated. It is similar with above case. This also may cause two problems. One is that we can't keep these freepages from being allocated. Although this pageblock is isolated, freepage would be added to normal buddy list so that it could be allocated without any restriction. And the other problem is same as case 1, that it, incorrect freepage accouting. This race condition would be prevented by checking migratetype again with holding the zone lock. Because it is somewhat heavy operation and it isn't needed in common case, we want to avoid rechecking as much as possible. So this patch introduce new variable, nr_isolate_pageblock in struct zone to check if there is isolated pageblock. With this, we can avoid to re-check migratetype in common case and do it only if there is isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above mentioned problems. Changes from v3: Add one more check in free_one_page() that checks whether migratetype is MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens. Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMinchan Kim <minchan@kernel.org> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Heesub Shin <heesub.shin@samsung.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Ritesh Harjani <ritesh.list@gmail.com> Cc: Gioh Kim <gioh.kim@lge.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-