1. 09 4月, 2016 4 次提交
  2. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  3. 31 3月, 2016 2 次提交
  4. 26 3月, 2016 1 次提交
  5. 23 3月, 2016 3 次提交
  6. 02 3月, 2016 3 次提交
    • H
      parisc: Wire up copy_file_range syscall · b4f09ae6
      Helge Deller 提交于
      Signed-off-by: NHelge Deller <deller@gmx.de>
      b4f09ae6
    • H
      parisc: Fix ptrace syscall number and return value modification · 98e8b6c9
      Helge Deller 提交于
      Mike Frysinger reported that his ptrace testcase showed strange
      behaviour on parisc: It was not possible to avoid a syscall and the
      return value of a syscall couldn't be changed.
      
      To modify a syscall number, we were missing to save the new syscall
      number to gr20 which is then picked up later in assembly again.
      
      The effect that the return value couldn't be changed is a side-effect of
      another bug in the assembly code. When a process is ptraced, userspace
      expects each syscall to report entrance and exit of a syscall.  If a
      syscall number was given which doesn't exist, we jumped to the normal
      syscall exit code instead of informing userspace that the (non-existant)
      syscall exits. This unexpected behaviour confuses userspace and thus the
      bug was misinterpreted as if we can't change the return value.
      
      This patch fixes both problems and was tested on 64bit kernel with
      32bit userspace.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: stable@vger.kernel.org  # v4.0+
      Tested-by: NMike Frysinger <vapier@gentoo.org>
      98e8b6c9
    • T
      arch/hotplug: Call into idle with a proper state · fc6d73d6
      Thomas Gleixner 提交于
      Let the non boot cpus call into idle with the corresponding hotplug state, so
      the hotplug core can handle the further bringup. That's a first step to
      convert the boot side of the hotplugged cpus to do all the synchronization
      with the other side through the state machine. For now it'll only start the
      hotplug thread and kick the full bringup of the cpu.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rafael Wysocki <rafael.j.wysocki@intel.com>
      Cc: "Srivatsa S. Bhat" <srivatsa@mit.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Link: http://lkml.kernel.org/r/20160226182341.614102639@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      fc6d73d6
  7. 21 1月, 2016 1 次提交
  8. 13 1月, 2016 3 次提交
  9. 21 12月, 2015 1 次提交
    • H
      parisc: Fix syscall restarts · 71a71fb5
      Helge Deller 提交于
      On parisc syscalls which are interrupted by signals sometimes failed to
      restart and instead returned -ENOSYS which in the worst case lead to
      userspace crashes.
      A similiar problem existed on MIPS and was fixed by commit e967ef02
      ("MIPS: Fix restart of indirect syscalls").
      
      On parisc the current syscall restart code assumes that all syscall
      callers load the syscall number in the delay slot of the ble
      instruction. That's how it is e.g. done in the unistd.h header file:
      	ble 0x100(%sr2, %r0)
      	ldi #syscall_nr, %r20
      Because of that assumption the current code never restored %r20 before
      returning to userspace.
      
      This assumption is at least not true for code which uses the glibc
      syscall() function, which instead uses this syntax:
      	ble 0x100(%sr2, %r0)
      	copy regX, %r20
      where regX depend on how the compiler optimizes the code and register
      usage.
      
      This patch fixes this problem by adding code to analyze how the syscall
      number is loaded in the delay branch and - if needed - copy the syscall
      number to regX prior returning to userspace for the syscall restart.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      71a71fb5
  10. 12 12月, 2015 2 次提交
  11. 05 12月, 2015 1 次提交
  12. 22 11月, 2015 5 次提交
    • H
      parisc: Map kernel text and data on huge pages · 41b85a11
      Helge Deller 提交于
      Adjust the linker script and map_pages() to map kernel text and data on
      physical 1MB huge/large pages.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      41b85a11
    • H
      parisc: Add Huge Page and HUGETLBFS support · 736d2169
      Helge Deller 提交于
      This patch adds huge page support to allow userspace to allocate huge
      pages and to use hugetlbfs filesystem on 32- and 64-bit Linux kernels.
      A later patch will add kernel support to map kernel text and data on
      huge pages.
      
      The only requirement is, that the kernel needs to be compiled for a
      PA8X00 CPU (PA2.0 architecture). Older PA1.X CPUs do not support
      variable page sizes. 64bit Kernels are compiled for PA2.0 by default.
      
      Technically on parisc multiple physical huge pages may be needed to
      emulate standard 2MB huge pages.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      736d2169
    • H
      parisc: Use long branch to do_syscall_trace_exit · 337685e5
      Helge Deller 提交于
      Use the 22bit instead of the 17bit branch instruction on a 64bit kernel
      to reach the do_syscall_trace_exit function from the gateway page.
      A huge page enabled kernel may need the additional branch distance bits.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      337685e5
    • H
      parisc: Increase initial kernel mapping to 32MB on 64bit kernel · 332b42e4
      Helge Deller 提交于
      For the 64bit kernel the initially 16 MB kernel memory might become too
      small if you build a kernel with many modules built-in and with kernel
      text and data areas mapped on huge pages.
      
      This patch increases the initial mapping to 32MB for 64bit kernels and
      keeps 16MB for 32bit kernels.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      332b42e4
    • H
      parisc: Initialize the fault vector earlier in the boot process. · 4182d0cd
      Helge Deller 提交于
      A fault vector on parisc needs to be 2K aligned.  Furthermore the
      checksum of the fault vector needs to sum up to 0 which is being
      calculated and written at runtime.
      
      Up to now we aligned both PA20 and PA11 fault vectors on the same 4K
      page in order to easily write the checksum after having mapped the
      kernel read-only (by mapping this page only as read-write).
      But when we want to map the kernel text and data on huge pages this
      makes things harder.
      So, simplify it by aligning both fault vectors on 2K boundries and write
      the checksum before we map the page read-only.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      4182d0cd
  13. 22 10月, 2015 2 次提交
  14. 08 9月, 2015 4 次提交
    • H
      6dc0dcde
    • H
      parisc: Drop CONFIG_SMP around update_cr16_clocksource() · 72581cec
      Helge Deller 提交于
      No need to use CONFIG_SMP around update_cr16_clocksource(). It checks for
      num_online_cpus() beeing greater than 1, which is always 1 in UP builds.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      72581cec
    • J
      parisc: Use double word condition in 64bit CAS operation · 1b59ddfc
      John David Anglin 提交于
      The attached change fixes the condition used in the "sub" instruction.
      A double word comparison is needed.  This fixes the 64-bit LWS CAS
      operation on 64-bit kernels.
      
      I can now enable 64-bit atomic support in GCC.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: John David Anglin <dave.anglin>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      1b59ddfc
    • H
      parisc: Filter out spurious interrupts in PA-RISC irq handler · b1b4e435
      Helge Deller 提交于
      When detecting a serial port on newer PA-RISC machines (with iosapic) we have a
      long way to go to find the right IRQ line, registering it, then registering the
      serial port and the irq handler for the serial port. During this phase spurious
      interrupts for the serial port may happen which then crashes the kernel because
      the action handler might not have been set up yet.
      
      So, basically it's a race condition between the serial port hardware and the
      CPU which sets up the necessary fields in the irq sructs. The main reason for
      this race is, that we unmask the serial port irqs too early without having set
      up everything properly before (which isn't easily possible because we need the
      IRQ number to register the serial ports).
      
      This patch is a work-around for this problem. It adds checks to the CPU irq
      handler to verify if the IRQ action field has been initialized already. If not,
      we just skip this interrupt (which isn't critical for a serial port at bootup).
      The real fix would probably involve rewriting all PA-RISC specific IRQ code
      (for CPU, IOSAPIC, GSC and EISA) to use IRQ domains with proper parenting of
      the irq chips and proper irq enabling along this line.
      
      This bug has been in the PA-RISC port since the beginning, but the crashes
      happened very rarely with currently used hardware.  But on the latest machine
      which I bought (a C8000 workstation), which uses the fastest CPUs (4 x PA8900,
      1GHz) and which has the largest possible L1 cache size (64MB each), the kernel
      crashed at every boot because of this race. So, without this patch the machine
      would currently be unuseable.
      
      For the record, here is the flow logic:
      1. serial_init_chip() in 8250_gsc.c calls iosapic_serial_irq().
      2. iosapic_serial_irq() calls txn_alloc_irq() to find the irq.
      3. iosapic_serial_irq() calls cpu_claim_irq() to register the CPU irq
      4. cpu_claim_irq() unmasks the CPU irq (which it shouldn't!)
      5. serial_init_chip() then registers the 8250 port.
      Problems:
      - In step 4 the CPU irq shouldn't have been registered yet, but after step 5
      - If serial irq happens between 4 and 5 have finished, the kernel will crash
      Signed-off-by: NHelge Deller <deller@gmx.de>
      b1b4e435
  15. 01 8月, 2015 1 次提交
  16. 11 7月, 2015 1 次提交
    • J
      parisc: Fix some PTE/TLB race conditions and optimize __flush_tlb_range based on timing results · 01ab6057
      John David Anglin 提交于
      The increased use of pdtlb/pitlb instructions seemed to increase the
      frequency of random segmentation faults building packages. Further, we
      had a number of cases where TLB inserts would repeatedly fail and all
      forward progress would stop. The Haskell ghc package caused a lot of
      trouble in this area. The final indication of a race in pte handling was
      this syslog entry on sibaris (C8000):
      
       swap_free: Unused swap offset entry 00000004
       BUG: Bad page map in process mysqld  pte:00000100 pmd:019bbec5
       addr:00000000ec464000 vm_flags:00100073 anon_vma:0000000221023828 mapping: (null) index:ec464
       CPU: 1 PID: 9176 Comm: mysqld Not tainted 4.0.0-2-parisc64-smp #1 Debian 4.0.5-1
       Backtrace:
        [<0000000040173eb0>] show_stack+0x20/0x38
        [<0000000040444424>] dump_stack+0x9c/0x110
        [<00000000402a0d38>] print_bad_pte+0x1a8/0x278
        [<00000000402a28b8>] unmap_single_vma+0x3d8/0x770
        [<00000000402a4090>] zap_page_range+0xf0/0x198
        [<00000000402ba2a4>] SyS_madvise+0x404/0x8c0
      
      Note that the pte value is 0 except for the accessed bit 0x100. This bit
      shouldn't be set without the present bit.
      
      It should be noted that the madvise system call is probably a trigger for many
      of the random segmentation faults.
      
      In looking at the kernel code, I found the following problems:
      
      1) The pte_clear define didn't take TLB lock when clearing a pte.
      2) We didn't test pte present bit inside lock in exception support.
      3) The pte and tlb locks needed to merged in order to ensure consistency
      between page table and TLB. This also has the effect of serializing TLB
      broadcasts on SMP systems.
      
      The attached change implements the above and a few other tweaks to try
      to improve performance. Based on the timing code, TLB purges are very
      slow (e.g., ~ 209 cycles per page on rp3440). Thus, I think it
      beneficial to test the split_tlb variable to avoid duplicate purges.
      Probably, all PA 2.0 machines have combined TLBs.
      
      I dropped using __flush_tlb_range in flush_tlb_mm as I realized all
      applications and most threads have a stack size that is too large to
      make this useful. I added some comments to this effect.
      
      Since implementing 1 through 3, I haven't had any random segmentation
      faults on mx3210 (rp3440) in about one week of building code and running
      as a Debian buildd.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v3.18+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      01ab6057
  17. 25 6月, 2015 1 次提交
  18. 17 6月, 2015 2 次提交
    • P
      parisc64: don't use module_init for non-modular core perf code · 15becabd
      Paul Gortmaker 提交于
      The perf.c code depends on CONFIG_64BIT, so it is either built-in
      or absent.  It will never be modular, so using module_init as an
      alias for __initcall is rather misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.  Aside from it not making sense, it also
      causes a ~10% increase in CPP overhead due to module.h having a
      large list of headers itself -- for example compare line counts:
      
       device_initcall() and <linux/init.h>
      	20238 arch/parisc/kernel/perf.i
      
       module_init() and <linux/module.h>
      	22194 arch/parisc/kernel/perf.i
      
      Direct use of __initcall is discouraged, vs prioritized ones.
      Use of device_initcall is consistent with what __initcall
      maps onto, and hence does not change the init order, making the
      impact of this change zero.   Should someone with real hardware
      for boot testing want to change it later to arch_initcall or
      something different, they can do that at a later date.
      
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      15becabd
    • P
      parisc: don't use module_init for non-modular core pdc_cons code · aed6850a
      Paul Gortmaker 提交于
      The pdc_cons.c code is always built in.  It will never be modular,
      so using module_init as an alias for __initcall is rather
      misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.
      
      Direct use of __initcall is discouraged, vs prioritized ones.
      Use of device_initcall is consistent with what __initcall
      maps onto, and hence does not change the init order, making the
      impact of this change zero.   Should someone with real hardware
      for boot testing want to change it later to arch_initcall or
      something different, they can do that at a later date.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      aed6850a
  19. 19 5月, 2015 1 次提交
    • D
      mm/fault, arch: Use pagefault_disable() to check for disabled pagefaults in the handler · 70ffdb93
      David Hildenbrand 提交于
      Introduce faulthandler_disabled() and use it to check for irq context and
      disabled pagefaults (via pagefault_disable()) in the pagefault handlers.
      
      Please note that we keep the in_atomic() checks in place - to detect
      whether in irq context (in which case preemption is always properly
      disabled).
      
      In contrast, preempt_disable() should never be used to disable pagefaults.
      With !CONFIG_PREEMPT_COUNT, preempt_disable() doesn't modify the preempt
      counter, and therefore the result of in_atomic() differs.
      We validate that condition by using might_fault() checks when calling
      might_sleep().
      
      Therefore, add a comment to faulthandler_disabled(), describing why this
      is needed.
      
      faulthandler_disabled() and pagefault_disable() are defined in
      linux/uaccess.h, so let's properly add that include to all relevant files.
      
      This patch is based on a patch from Thomas Gleixner.
      Reviewed-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David.Laight@ACULAB.COM
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: airlied@linux.ie
      Cc: akpm@linux-foundation.org
      Cc: benh@kernel.crashing.org
      Cc: bigeasy@linutronix.de
      Cc: borntraeger@de.ibm.com
      Cc: daniel.vetter@intel.com
      Cc: heiko.carstens@de.ibm.com
      Cc: herbert@gondor.apana.org.au
      Cc: hocko@suse.cz
      Cc: hughd@google.com
      Cc: mst@redhat.com
      Cc: paulus@samba.org
      Cc: ralf@linux-mips.org
      Cc: schwidefsky@de.ibm.com
      Cc: yang.shi@windriver.com
      Link: http://lkml.kernel.org/r/1431359540-32227-7-git-send-email-dahi@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      70ffdb93
  20. 13 5月, 2015 1 次提交
    • H
      parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures · d045c77c
      Helge Deller 提交于
      On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
      currently parisc and metag only) stack randomization sometimes leads to crashes
      when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
      by default if not defined in arch-specific headers).
      
      The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
      additional space needed for the stack randomization (as defined by the value of
      STACK_RND_MASK) was not taken into account yet and as such, when the stack
      randomization code added a random offset to the stack start, the stack
      effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
      which then sometimes leads to out-of-stack situations and crashes.
      
      This patch fixes it by adding the maximum possible amount of memory (based on
      STACK_RND_MASK) which theoretically could be added by the stack randomization
      code to the initial stack size. That way, the user-defined stack size is always
      guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
      
      This bug is currently not visible on the metag architecture, because on metag
      STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
      
      The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
      section, so it does not affect other platformws beside those where the
      stack grows upwards (parisc and metag).
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Cc: stable@vger.kernel.org # v3.16+
      d045c77c