- 01 4月, 2015 5 次提交
-
-
由 Jens Axboe 提交于
Usually the admin queue depth of 64 is plenty, but for some use cases we really need it larger. Examples are use cases like MAT, where you have to touch all of NAND for init/format like purposes. In those cases, we see a good 2x increase with an increased queue depth. Signed-off-by: NJens Axboe <axboe@fb.com> Acked-by: NKeith Busch <keith.busch@intel.com>
-
由 Murali Iyer 提交于
PRP list calculation is supposed to be based on device's page size. Systems with page size larger than device's page size cause corruption to the name space as well as system memory with out this fix. Systems like x86 might not experience this issue because it uses PAGE_SIZE of 4K where as powerpc uses PAGE_SIZE of 64k while NVMe device's page size varies depending upon the vendor. Signed-off-by: NMurali Iyer <mniyer@us.ibm.com> Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
The driver may issue commands to a device that may never return, so its request_queue could always have active requests while the controller is running. Waiting for the queue to freeze could block forever, which is what blk-mq's hot cpu notification handler was doing when nvme drives were in use. This has the nvme driver make the asynchronous event command's tag reserved and does not keep the request active. We can't have more than one since the request is released back to the request_queue before the command is completed. Having only one avoids potential tag collisions, and reserving the tag for this purpose prevents other admin tasks from reusing the tag. I also couldn't think of a scenario where issuing AEN requests single depth is worse than issuing them in batches, so I don't think we lose anything with this change. As an added bonus, doing it this way removes "Cancelling I/O" warnings observed when unbinding the nvme driver from a device. Reported-by: NYigal Korman <yigal@plexistor.com> Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Chong Yuan 提交于
Remove unused mask in nvme_alloc_iod Signed-off-by: NChong Yuan <chong.yuan@memblaze.com> Reviewed-by: NWenbo Wang <wenbo.wang@memblaze.com> Acked-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Keith Busch 提交于
This fixes a race accessing an invalid address when a controller's admin queue is in use during a reset for failure or hot removal occurs. The admin queue will be frozen to prevent new users from entering prior to the doorbell queue being unmapped. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 25 3月, 2015 2 次提交
-
-
由 David Rientjes 提交于
Mempools created for slab caches should use mempool_create_slab_pool(). Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 David Rientjes 提交于
mempool_alloc() does not support __GFP_ZERO since elements may come from memory that has already been released by mempool_free(). Remove __GFP_ZERO from mempool_alloc() in drbd_req_new() and properly initialize it to 0. Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Jens Axboe <axboe@fb.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 3月, 2015 1 次提交
-
-
由 Ross Zwisler 提交于
Nick Piggin is currently listed as the maintainer of BRD in MAINTAINERS, but the mails sent to the listed address are returned as undeliverable. Update the maintainer for BRD to be Jens Axboe, since patches for BRD flow up through him. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 3月, 2015 24 次提交
-
-
由 Keith Busch 提交于
Return -EBUSY if we're unable to enter a queue immediately when allocating a blk-mq request without __GFP_WAIT. Signed-off-by: NKeith Busch <keith.busch@intel.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Snitzer 提交于
Rename blk_mq_run_queues to blk_mq_run_hw_queues, add async argument, and export it. DM's suspend support must be able to run the queue without starting stopped hw queues. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Snitzer 提交于
Add a variant of blk_mq_init_queue that allows a previously allocated queue to be initialized. blk_mq_init_allocated_queue models blk_init_allocated_queue -- which was also created for DM's use. DM's approach to device creation requires a placeholder request_queue be allocated for use with alloc_dev() but the decision about what type of request_queue will be ultimately created is deferred until all component devices referenced in the DM table are processed to determine the table type (request-based, blk-mq request-based, or bio-based). Also, because of DM's late finalization of the request_queue type the call to blk_mq_register_disk() doesn't happen during alloc_dev(). Must export blk_mq_register_disk() so that DM can backfill the 'mq' dir once the blk-mq queue is fully allocated. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Jens Axboe 提交于
-
由 Mike Snitzer 提交于
If percpu_ref_init() fails the allocated q and hctxs must get cleaned up; using 'err_map' doesn't allow that to happen. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMing Lei <ming.lei@canonical.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Linus Torvalds 提交于
Merge misc fixes from Andrew Morton: "13 fixes" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: memcg: disable hierarchy support if bound to the legacy cgroup hierarchy mm: reorder can_do_mlock to fix audit denial kasan, module: move MODULE_ALIGN macro into <linux/moduleloader.h> kasan, module, vmalloc: rework shadow allocation for modules fanotify: fix event filtering with FAN_ONDIR set mm/nommu.c: export symbol max_mapnr arch/c6x/include/asm/pgtable.h: define dummy pgprot_writecombine for !MMU nilfs2: fix deadlock of segment constructor during recovery mm: cma: fix CMA aligned offset calculation mm, hugetlb: close race when setting PageTail for gigantic pages mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disabled drivers/rtc/rtc-s3c.c: add .needs_src_clk to s3c6410 RTC data ocfs2: make append_dio an incompat feature
-
由 Vladimir Davydov 提交于
If the memory cgroup controller is initially mounted in the scope of the default cgroup hierarchy and then remounted to a legacy hierarchy, it will still have hierarchy support enabled, which is incorrect. We should disable hierarchy support if bound to the legacy cgroup hierarchy. Signed-off-by: NVladimir Davydov <vdavydov@parallels.com> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Vander Stoep 提交于
A userspace call to mmap(MAP_LOCKED) may result in the successful locking of memory while also producing a confusing audit log denial. can_do_mlock checks capable and rlimit. If either of these return positive can_do_mlock returns true. The capable check leads to an LSM hook used by apparmour and selinux which produce the audit denial. Reordering so rlimit is checked first eliminates the denial on success, only recording a denial when the lock is unsuccessful as a result of the denial. Signed-off-by: NJeff Vander Stoep <jeffv@google.com> Acked-by: NNick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Paul Cassella <cassella@cray.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
include/linux/moduleloader.h is more suitable place for this macro. Also change alignment to PAGE_SIZE for CONFIG_KASAN=n as such alignment already assumed in several places. Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Cc: Dmitry Vyukov <dvyukov@google.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrey Ryabinin 提交于
Current approach in handling shadow memory for modules is broken. Shadow memory could be freed only after memory shadow corresponds it is no longer used. vfree() called from interrupt context could use memory its freeing to store 'struct llist_node' in it: void vfree(const void *addr) { ... if (unlikely(in_interrupt())) { struct vfree_deferred *p = this_cpu_ptr(&vfree_deferred); if (llist_add((struct llist_node *)addr, &p->list)) schedule_work(&p->wq); Later this list node used in free_work() which actually frees memory. Currently module_memfree() called in interrupt context will free shadow before freeing module's memory which could provoke kernel crash. So shadow memory should be freed after module's memory. However, such deallocation order could race with kasan_module_alloc() in module_alloc(). Free shadow right before releasing vm area. At this point vfree()'d memory is not used anymore and yet not available for other allocations. New VM_KASAN flag used to indicate that vm area has dynamically allocated shadow memory so kasan frees shadow only if it was previously allocated. Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Suzuki K. Poulose 提交于
With FAN_ONDIR set, the user can end up getting events, which it hasn't marked. This was revealed with fanotify04 testcase failure on Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0 ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask"). # /opt/ltp/testcases/bin/fanotify04 [ ... ] fanotify04 7 TPASS : event generated properly for type 100000 fanotify04 8 TFAIL : fanotify04.c:147: got unexpected event 30 fanotify04 9 TPASS : No event as expected The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for a fanotify on a dir. Then does an open(), followed by close() of the directory and expects to see an event FAN_OPEN(0x20). However, the fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)). This happens due to the flaw in the check for event_mask in fanotify_should_send_event() which does: if (event_mask & marks_mask & ~marks_ignored_mask) return true; where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE), marks_mask == (FAN_ONDIR | FAN_OPEN), marks_ignored_mask == 0 Fix this by masking the outgoing events to the user, as we already take care of FAN_ONDIR and FAN_EVENT_ON_CHILD. Signed-off-by: NSuzuki K. Poulose <suzuki.poulose@arm.com> Tested-by: NLino Sanfilippo <LinoSanfilippo@gmx.de> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Eric Paris <eparis@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 gchen gchen 提交于
Several modules may need max_mapnr, so export, the related error with allmodconfig under c6x: MODPOST 3327 modules ERROR: "max_mapnr" [fs/pstore/ramoops.ko] undefined! ERROR: "max_mapnr" [drivers/media/v4l2-core/videobuf2-dma-contig.ko] undefined! Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chen Gang 提交于
When !MMU, asm-generic will not define default pgprot_writecombine, so c6x needs to define it by itself. The related error: CC [M] fs/pstore/ram_core.o fs/pstore/ram_core.c: In function 'persistent_ram_vmap': fs/pstore/ram_core.c:399:10: error: implicit declaration of function 'pgprot_writecombine' [-Werror=implicit-function-declaration] prot = pgprot_writecombine(PAGE_KERNEL); ^ fs/pstore/ram_core.c:399:8: error: incompatible types when assigning to type 'pgprot_t {aka struct <anonymous>}' from type 'int' prot = pgprot_writecombine(PAGE_KERNEL); ^ Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com> Cc: Mark Salter <msalter@redhat.com> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ryusuke Konishi 提交于
According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck during recovery at mount time. The code path that caused the deadlock was as follows: nilfs_fill_super() load_nilfs() nilfs_salvage_orphan_logs() * Do roll-forwarding, attach segment constructor for recovery, and kick it. nilfs_segctor_thread() nilfs_segctor_thread_construct() * A lock is held with nilfs_transaction_lock() nilfs_segctor_do_construct() nilfs_segctor_drop_written_files() iput() iput_final() write_inode_now() writeback_single_inode() __writeback_single_inode() do_writepages() nilfs_writepage() nilfs_construct_dsync_segment() nilfs_transaction_lock() --> deadlock This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment constructor over I_SYNC flag") is applied and roll-forward recovery was performed at mount time. The roll-forward recovery can happen if datasync write is done and the file system crashes immediately after that. For instance, we can reproduce the issue with the following steps: < nilfs2 is mounted on /nilfs (device: /dev/sdb1) > # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k count=1 && reboot -nfh < the system will immediately reboot > # mount -t nilfs2 /dev/sdb1 /nilfs The deadlock occurs because iput() can run segment constructor through writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The above commit changed segment constructor so that it calls iput() asynchronously for inodes with i_nlink == 0, but that change was imperfect. This fixes the another deadlock by deferring iput() in segment constructor even for the case that mount is not finished, that is, for the case that MS_ACTIVE flag is not set. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Reported-by: NYuxuan Shui <yshuiv7@gmail.com> Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Danesh Petigara 提交于
The CMA aligned offset calculation is incorrect for non-zero order_per_bit values. For example, if cma->order_per_bit=1, cma->base_pfn= 0x2f800000 and align_order=12, the function returns a value of 0x17c00 instead of 0x400. This patch fixes the CMA aligned offset calculation. The previous calculation was wrong and would return too-large values for the offset, so that when cma_alloc looks for free pages in the bitmap with the requested alignment > order_per_bit, it starts too far into the bitmap and so CMA allocations will fail despite there actually being plenty of free pages remaining. It will also probably have the wrong alignment. With this change, we will get the correct offset into the bitmap. One affected user is powerpc KVM, which has kvm_cma->order_per_bit set to KVM_CMA_CHUNK_ORDER - PAGE_SHIFT, or 18 - 12 = 6. [gregory.0xf0@gmail.com: changelog additions] Signed-off-by: NDanesh Petigara <dpetigara@broadcom.com> Reviewed-by: NGregory Fong <gregory.0xf0@gmail.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Now that gigantic pages are dynamically allocatable, care must be taken to ensure that p->first_page is valid before setting PageTail. If this isn't done, then it is possible to race and have compound_head() return NULL. Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NHillf Danton <hillf.zj@alibaba-inc.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail after OOM killer is disabled if the allocation is performed by a kernel thread. This behavior was introduced from the very beginning by 7f33d49a ("mm, PM/Freezer: Disable OOM killer when tasks are frozen"). This means that the basic contract for the allocation request is broken and the context requesting such an allocation might blow up unexpectedly. There are basically two ways forward. 1) move oom_killer_disable after kernel threads are frozen. This has a risk that the OOM victim wouldn't be able to finish because it would depend on an already frozen kernel thread. This would be really tricky to debug. 2) do not fail GFP_NOFAIL allocation no matter what and risk a potential Freezable kernel threads will loop and fail the suspend. Incidental allocations after kernel threads are frozen will at least dump a warning - if we are lucky and the serial console is still active of course... This patch implements the later option because it is safer. We would see warning rather than allocation failures for the kernel threads which would blow up otherwise and have a higher chances to identify __GFP_NOFAIL users from deeper pm code. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NDavid Rientjes <rientjes@gooogle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Javier Martinez Canillas 提交于
Commit df9e26d0 ("rtc: s3c: add support for RTC of Exynos3250 SoC") added an "rtc_src" DT property to specify the clock used as a source to the S3C real-time clock. Not all SoCs needs this so commit eaf3a659 ("drivers/rtc/rtc-s3c.c: fix initialization failure without rtc source clock") changed to check the struct s3c_rtc_data .needs_src_clk to conditionally grab the clock. But that commit didn't update the data for each IP version so the RTC broke on the boards that needs a source clock. This is the case of at least Exynos5250 and Exynos5440 which uses the s3c6410 RTC IP block. This commit fixes the S3C rtc on the Exynos5250 Snow and Exynos5420 Peach Pit and Pi Chromebooks. Signed-off-by: NJavier Martinez Canillas <javier.martinez@collabora.co.uk> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Chanwoo Choi <cw00.choi@samsung.com> Cc: Doug Anderson <dianders@chromium.org> Cc: Olof Johansson <olof@lixom.net> Cc: Kevin Hilman <khilman@linaro.org> Cc: Tyler Baker <tyler.baker@linaro.org> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mark Fasheh 提交于
It turns out that making this feature ro_compat isn't quite enough to prevent accidental corruption on mount from older kernels. Ocfs2 (like other file systems) will process orphaned inodes even when the user mounts in 'ro' mode. So for the case of a filesystem not knowing the append_dio feature, mounting the filesystem could result in orphaned-for-dio files being deleted, which we clearly don't want. So instead, turn this into an incompat flag. Btw, this is kind of my fault - initially I asked that we add a flag to cover the feature and even suggested that we use an ro flag. It wasn't until I was looking through our commits for v4.0-rc1 that I realized we actually want this to be incompat. Signed-off-by: NMark Fasheh <mfasheh@suse.de> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
The wrong value is being returned by change_huge_pmd since commit 10c1045f ("mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries") which allows a fallthrough that tries to adjust non-existent PTEs. This patch corrects it. Signed-off-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux由 Linus Torvalds 提交于
Pull i2c fix from Wolfram Sang: "An important bugfix for the I2C subsystem core" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: Revert "i2c: core: Dispose OF IRQ mapping at client removal time"
-
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci由 Linus Torvalds 提交于
Pull PCI fixes from Bjorn Helgaas: "Here are a couple updates for v4.0. One fixes a config accessor problem on APM X-Gene that we introduced when switching to generic config accessors, and the other fixes an older read-past-end-of-buffer problem in sysfs. APM X-Gene host bridge driver - Add register offset to config space base address (Feng Kan) Miscellaneous - Don't read past the end of sysfs "driver_override" buffer (Sasha Levin)" * tag 'pci-v4.0-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: xgene: Add register offset to config space base address PCI: Don't read past the end of sysfs "driver_override" buffer
-
git://git.monstr.eu/linux-2.6-microblaze由 Linus Torvalds 提交于
Pull arch/microblaze fixes from Michal Simek: "Fix syscall error recovery. Two patches - one is just preparation patch for the second which is fixing the problem with syscalls" * tag 'microblaze-4.0-rc4' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: Fix syscall error recovery for invalid syscall IDs microblaze: Coding style cleanup
-
git://git.rocketboards.org/linux-socfpga-next由 Linus Torvalds 提交于
Pull arch/nios2 fix from Ley Foon Tan: "Remove pt_regs from user header and use generic ucontext.h" * tag 'nios2-fix-4.0-rc4' of git://git.rocketboards.org/linux-socfpga-next: nios2: update pt_regs
-
- 12 3月, 2015 3 次提交
-
-
由 Linus Torvalds 提交于
Dave Chinner reported that commit 4d942466 ("mm: convert p[te|md]_mknonnuma and remaining page table manipulations") slowed down his xfsrepair test enormously. In particular, it was using more system time due to extra TLB flushing. The ultimate reason turns out to be how the change to use the regular page table accessor functions broke the NUMA grouping logic. The old special mknuma/mknonnuma code accessed the page table present bit and the magic NUMA bit directly, while the new code just changes the page protections using PROT_NONE and the regular vma protections. That sounds equivalent, and from a fault standpoint it really is, but a subtle side effect is that the *other* protection bits of the page table entries also change. And the code to decide how to group the NUMA entries together used the writable bit to decide whether a particular page was likely to be shared read-only or not. And with the change to make the NUMA handling use the regular permission setting functions, that writable bit was basically always cleared for private mappings due to COW. So even if the page actually ends up being written to in the end, the NUMA balancing would act as if it was always shared RO. This code is a heuristic anyway, so the fix - at least for now - is to instead check whether the page is dirty rather than writable. The bit doesn't change with protection changes. NOTE! This also adds a FIXME comment to revisit this issue, Not only should we probably re-visit the whole "is this a shared read-only page" heuristic (we might want to take the vma permissions into account and base this more on those than the per-page ones, and also look at whether the particular access that triggers it is a write or not), but the whole COW issue shows that we should think about the NUMA fault handling some more. For example, maybe we should do the early-COW thing that a regular fault does. Or maybe we should accept that while using the same bits as PROTNONE was a good thing (and got rid of the specual NUMA bit), we might still want to just preseve the other protection bits across NUMA faulting. Those are bigger questions, left for later. This just fixes up the heuristic so that it at least approximates working again. More analysis and work needed. Reported-by: NDave Chinner <david@fromorbit.com> Tested-by: NMel Gorman <mgorman@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org>, Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jakub Kicinski 提交于
This reverts commit e4df3a0b ("i2c: core: Dispose OF IRQ mapping at client removal time") Calling irq_dispose_mapping() will destroy the mapping and disassociate the IRQ from the IRQ chip to which it belongs. Keeping it is OK, because existent mappings are reused properly. Also, this commit breaks drivers using devm* for IRQ management on OF-based systems because devm* cleanup happens in device code, after bus's remove() method returns. Signed-off-by: NJakub Kicinski <kubakici@wp.pl> Reported-by: NSébastien Szymanski <sebastien.szymanski@armadeus.com> Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com> Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com> [wsa: updated the commit message with findings fromt the other bug report] Signed-off-by: NWolfram Sang <wsa@the-dreams.de> Cc: stable@kernel.org Fixes: e4df3a0b
-
由 Chung-Ling Tang 提交于
Remove struct pt_regs from user header and use generic ucontext.h. Signed-off-by: NChung-Ling Tang <cltang@codesourcery.com> Acked-by: NLey Foon Tan <lftan@altera.com>
-
- 11 3月, 2015 2 次提交
-
-
git://git.infradead.org/linux-mtd由 Linus Torvalds 提交于
Pull MTD fixes from Brian Norris: * pxa3xx_nand - fix timeout issues when draining the FIFO (BCH only) - don't crash when no chip-selects are used * hisi504_nand - depend on HAS_DMA, to fix compile errors * tag 'for-linus-20150310' of git://git.infradead.org/linux-mtd: mtd: nand: MTD_NAND_HISI504 should depend on HAS_DMA mtd: pxa3xx_nand: fix driver when num_cs is 0 mtd: nand: pxa3xx: Fix PIO FIFO draining
-
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu由 Linus Torvalds 提交于
Pull iommu fixes from Joerg Roedel: "The patches contain: - fix multiple ARM IOMMU drivers to behave well when the hardware is not present - mark MSM driver as broken - fix build errors with the new ARM generic io-page-table code" * tag 'iommu-fixes-v4.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/io-pgtable-arm: Add built time dependency iommu/msm: Mark driver BROKEN iommu/rockchip: Play nice in multi-platform builds iommu/omap: Play nice in multi-platform builds iommu/exynos: Play nice in multi-platform builds iommu/io-pgtable-arm: Fix self-test WARNs on i386
-
- 10 3月, 2015 3 次提交
-
-
git://git.kernel.org/pub/scm/virt/kvm/kvm由 Linus Torvalds 提交于
Pull kvm/s390 bugfixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: s390: non-LPAR case obsolete during facilities mask init KVM: s390: include guest facilities in kvm facility test KVM: s390: fix in memory copy of facility lists KVM: s390/cpacf: Fix kernel bug under z/VM KVM: s390/cpacf: Enable key wrapping by default
-
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux由 Linus Torvalds 提交于
Pull s390 fixes from Martin Schwidefsky: "One performance optimization for page_clear and a couple of bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: fix incorrect ASCE after crst_table_downgrade s390/ftrace: fix crashes when switching tracers / add notrace to cpu_relax() s390/pci: unify pci_iomap symbol exports s390/pci: fix [un]map_resources sequence s390: let the compiler do page clearing s390/pci: fix possible information leak in mmio syscall s390/dcss: array index 'i' is used before limits check. s390/scm_block: fix off by one during cluster reservation s390/jump label: improve and fix sanity check s390/jump label: add missing jump_label_apply_nops() call
-
由 Linus Torvalds 提交于
Merge tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull seq-buf/ftrace fixes from Steven Rostedt: "This includes fixes for seq_buf_bprintf() truncation issue. It also contains fixes to ftrace when /proc/sys/kernel/ftrace_enabled and function tracing are started. Doing the following causes some issues: # echo 0 > /proc/sys/kernel/ftrace_enabled # echo function_graph > /sys/kernel/debug/tracing/current_tracer # echo 1 > /proc/sys/kernel/ftrace_enabled # echo nop > /sys/kernel/debug/tracing/current_tracer # echo function_graph > /sys/kernel/debug/tracing/current_tracer As well as with function tracing too. Pratyush Anand first reported this issue to me and supplied a patch. When I tested this on my x86 test box, it caused thousands of backtraces and warnings to appear in dmesg, which also caused a denial of service (a warning for every function that was listed). I applied Pratyush's patch but it did not fix the issue for me. I looked into it and found a slight problem with trampoline accounting. I fixed it and sent Pratyush a patch, but he said that it did not fix the issue for him. I later learned tha Pratyush was using an ARM64 server, and when I tested on my ARM board, I was able to reproduce the same issue as Pratyush. After applying his patch, it fixed the problem. The above test uncovered two different bugs, one in x86 and one in ARM and ARM64. As this looked like it would affect PowerPC, I tested it on my PPC64 box. It too broke, but neither the patch that fixed ARM or x86 fixed this box (the changes were all in generic code!). The above test, uncovered two more bugs that affected PowerPC. Again, the changes were only done to generic code. It's the way the arch code expected things to be done that was different between the archs. Some where more sensitive than others. The rest of this series fixes the PPC bugs as well" * tag 'trace-fixes-v4.0-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Fix ftrace enable ordering of sysctl ftrace_enabled ftrace: Fix en(dis)able graph caller when en(dis)abling record via sysctl ftrace: Clear REGS_EN and TRAMP_EN flags on disabling record via sysctl seq_buf: Fix seq_buf_bprintf() truncation seq_buf: Fix seq_buf_vprintf() truncation
-