1. 01 11月, 2011 1 次提交
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
  2. 31 10月, 2011 1 次提交
  3. 30 10月, 2011 5 次提交
  4. 29 10月, 2011 2 次提交
  5. 28 10月, 2011 1 次提交
    • E
      compat: sync compat_stats with statfs. · 1448c721
      Eric W. Biederman 提交于
      This was found by inspection while tracking a similar
      bug in compat_statfs64, that has been fixed in mainline
      since decemeber.
      
      - This fixes a bug where not all of the f_spare fields
        were cleared on mips and s390.
      - Add the f_flags field to struct compat_statfs
      - Copy f_flags to userspace in case someone cares.
      - Use __clear_user to copy the f_spare field to userspace
        to ensure that all of the elements of f_spare are cleared.
        On some architectures f_spare is has 5 ints and on some
        architectures f_spare only has 4 ints.  Which makes
        the previous technique of clearing each int individually
        broken.
      
      I don't expect anyone actually uses the old statfs system
      call anymore but if they do let them benefit from having
      the compat and the native version working the same.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      1448c721
  6. 27 10月, 2011 7 次提交
  7. 26 10月, 2011 15 次提交
  8. 25 10月, 2011 5 次提交
    • J
      x86: Fix compilation bug in kprobes' twobyte_is_boostable · 315eb8a2
      Josh Stone 提交于
      When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I
      noticed a warning about the asm operand for test_bit in kprobes'
      can_boost.  I discovered that this caused only the first long of
      twobyte_is_boostable[] to be output.
      
      Jakub filed and fixed gcc PR50571 to correct the warning and this output
      issue.  But to solve it for less current gcc, we can make kprobes'
      twobyte_is_boostable[] non-const, and it won't be optimized out.
      
      Before:
      
          CC      arch/x86/kernel/kprobes.o
        In file included from include/linux/bitops.h:22:0,
                         from include/linux/kernel.h:17,
                         from [...]/arch/x86/include/asm/percpu.h:44,
                         from [...]/arch/x86/include/asm/current.h:5,
                         from [...]/arch/x86/include/asm/processor.h:15,
                         from [...]/arch/x86/include/asm/atomic.h:6,
                         from include/linux/atomic.h:4,
                         from include/linux/mutex.h:18,
                         from include/linux/notifier.h:13,
                         from include/linux/kprobes.h:34,
                         from arch/x86/kernel/kprobes.c:43:
        [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’:
        [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input
              without lvalue in asm operand 1 is deprecated [enabled by default]
      
        $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
             551:	0f a3 05 00 00 00 00 	bt     %eax,0x0
                                554: R_386_32	.rodata.cst4
      
        $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
      
        arch/x86/kernel/kprobes.o:     file format elf32-i386
      
        Contents of section .data:
         0000 48000000 00000000 00000000 00000000  H...............
        Contents of section .rodata.cst4:
         0000 4c030000                             L...
      
      Only a single long of twobyte_is_boostable[] is in the object file.
      
      After, without the const on twobyte_is_boostable:
      
        $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt
             551:	0f a3 05 20 00 00 00 	bt     %eax,0x20
                                554: R_386_32	.data
      
        $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o
      
        arch/x86/kernel/kprobes.o:     file format elf32-i386
      
        Contents of section .data:
         0000 48000000 00000000 00000000 00000000  H...............
         0010 00000000 00000000 00000000 00000000  ................
         0020 4c030000 0f000200 ffff0000 ffcff0c0  L...............
         0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77  ....;.......&..w
      
      Now all 32 bytes are output into .data instead.
      Signed-off-by: NJosh Stone <jistone@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      315eb8a2
    • N
      ARM: 7139/1: fix compilation with CONFIG_ARM_ATAG_DTB_COMPAT and large TEXT_OFFSET · 531a6a94
      Nicolas Pitre 提交于
      If TEXT_OFFSET is too large (e.g. like on MSM) the resulting immediate
      argument gets wider than 8 bits.
      
      Noticed by David Brown <davidb@codeaurora.org>
      Signed-off-by: NNicolas Pitre <nicolas.pitre@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      531a6a94
    • K
      m68k: Finally remove leftover markers sections · bc74ee97
      Kirill Tkhai 提交于
      Markers have removed already twice:
      
      1: fc537766
      2: eb878b3b
      
      But a little bit is still here.
      Signed-off-by: NTkhai Kirill <tkhai@yandex.ru>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      bc74ee97
    • F
      m68k/mac: Fix mac_irq_pending() for PSC MACE and SCC · 8b223432
      Finn Thain 提交于
      Add missing return statement. The docs say that the level 4 PSC IRQs
      relate to MACE DMA and SCC. Since those drivers don't call
      mac_irq_pending() this patch has no affect. But it should be fixed all the
      same, since it can be useful for MACE debugging.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      8b223432
    • F
      m68k/mac: Fix compiler warning in via_read_time() · 75a23850
      Finn Thain 提交于
      The algorithm described in the comment compares two reads from the RTC but
      the code actually reads once and compares the result to an uninitialized
      value. This causes the compiler to warn, "last_result maybe used
      uninitialized". Make the code match the comment, fix the warning and
      perhaps improve reliability. Tested on a Quadra 700.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      75a23850
  9. 24 10月, 2011 2 次提交
    • T
      x86: Fix S4 regression · 8548c84d
      Takashi Iwai 提交于
      Commit 4b239f45 ("x86-64, mm: Put early page table high") causes a S4
      regression since 2.6.39, namely the machine reboots occasionally at S4
      resume.  It doesn't happen always, overall rate is about 1/20.  But,
      like other bugs, once when this happens, it continues to happen.
      
      This patch fixes the problem by essentially reverting the memory
      assignment in the older way.
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Cc: <stable@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Yinghai Lu <yinghai.lu@oracle.com>
      [ We'll hopefully find the real fix, but that's too late for 3.1 now ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8548c84d
    • T
      ARM: 7133/1: SMP: fix per cpu timer setup before the cpu is marked online · eb047454
      Thomas Gleinxer 提交于
      The problem is related to the early enabling of interrupts and the
      per cpu timer setup before the cpu is marked online. This doesn't
      need to be done in order to call calibrate_delay().
      
      calibrate_delay() monitors jiffies, which are updated from the CPU
      which is waiting for the new CPU to set the online bit.
      
      So simply calibrate_delay() can be called on the new CPU just from
      the interrupt disabled region and move the local timer setup after
      stored the cpu data and before enabling interrupts.
      
      This solves both the cpu_online vs. cpu_active problem and the
      affinity setting of the per cpu timers.
      Signed-off-by: NThomas Gleinxer <tglx@linutronix.de>
      Signed-off-by: NKukjin Kim <kgene.kim@samsung.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      eb047454
  10. 23 10月, 2011 1 次提交