1. 23 1月, 2015 8 次提交
  2. 12 12月, 2014 2 次提交
    • A
      arch: Add lightweight memory barriers dma_rmb() and dma_wmb() · 1077fa36
      Alexander Duyck 提交于
      There are a number of situations where the mandatory barriers rmb() and
      wmb() are used to order memory/memory operations in the device drivers
      and those barriers are much heavier than they actually need to be.  For
      example in the case of PowerPC wmb() calls the heavy-weight sync
      instruction when for coherent memory operations all that is really needed
      is an lsync or eieio instruction.
      
      This commit adds a coherent only version of the mandatory memory barriers
      rmb() and wmb().  In most cases this should result in the barrier being the
      same as the SMP barriers for the SMP case, however in some cases we use a
      barrier that is somewhere in between rmb() and smp_rmb().  For example on
      ARM the rmb barriers break down as follows:
      
        Barrier   Call     Explanation
        --------- -------- ----------------------------------
        rmb()     dsb()    Data synchronization barrier - system
        dma_rmb() dmb(osh) data memory barrier - outer sharable
        smp_rmb() dmb(ish) data memory barrier - inner sharable
      
      These new barriers are not as safe as the standard rmb() and wmb().
      Specifically they do not guarantee ordering between coherent and incoherent
      memories.  The primary use case for these would be to enforce ordering of
      reads and writes when accessing coherent memory that is shared between the
      CPU and a device.
      
      It may also be noted that there is no dma_mb().  Most architectures don't
      provide a good mechanism for performing a coherent only full barrier without
      resorting to the same mechanism used in mb().  As such there isn't much to
      be gained in trying to define such a function.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: David Miller <davem@davemloft.net>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1077fa36
    • A
      arch: Cleanup read_barrier_depends() and comments · 8a449718
      Alexander Duyck 提交于
      This patch is meant to cleanup the handling of read_barrier_depends and
      smp_read_barrier_depends.  In multiple spots in the kernel headers
      read_barrier_depends is defined as "do {} while (0)", however we then go
      into the SMP vs non-SMP sections and have the SMP version reference
      read_barrier_depends, and the non-SMP define it as yet another empty
      do/while.
      
      With this commit I went through and cleaned out the duplicate definitions
      and reduced the number of definitions down to 2 per header.  In addition I
      moved the 50 line comments for the macro from the x86 and mips headers that
      defined it as an empty do/while to those that were actually defining the
      macro, alpha and blackfin.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a449718
  3. 11 12月, 2014 1 次提交
  4. 08 12月, 2014 5 次提交
  5. 06 12月, 2014 1 次提交
    • A
      net: sock: allow eBPF programs to be attached to sockets · 89aa0758
      Alexei Starovoitov 提交于
      introduce new setsockopt() command:
      
      setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
      
      where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
      and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
      
      setsockopt() calls bpf_prog_get() which increments refcnt of the program,
      so it doesn't get unloaded while socket is using the program.
      
      The same eBPF program can be attached to multiple sockets.
      
      User task exit automatically closes socket which calls sk_filter_uncharge()
      which decrements refcnt of eBPF program
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aa0758
  6. 28 11月, 2014 6 次提交
  7. 19 11月, 2014 4 次提交
  8. 12 11月, 2014 1 次提交
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
  9. 10 11月, 2014 1 次提交
    • T
      /dev/mem: Use more consistent data types · 4707a341
      Thierry Reding 提交于
      The xlate_dev_{kmem,mem}_ptr() functions take either a physical address
      or a kernel virtual address, so data types should be phys_addr_t and
      void *. They both return a kernel virtual address which is only ever
      used in calls to copy_{from,to}_user(), so make variables that store it
      void * rather than char * for consistency.
      
      Also only define a weak unxlate_dev_mem_ptr() function if architectures
      haven't overridden them in the asm/io.h header file.
      Signed-off-by: NThierry Reding <treding@nvidia.com>
      4707a341
  10. 03 11月, 2014 3 次提交
    • M
      s390/pci: add sparse annotations · 5b9f2081
      Martin Schwidefsky 提交于
      Fix the following warnings from the sparse code checker:
      
      arch/s390/include/asm/pci_io.h:165:49: warning: cast removes address space of expression
      arch/s390/pci/pci.c:476:44: warning: cast removes address space of expression
      arch/s390/pci/pci.c:491:36: warning: incorrect type in argument 2 (different address spaces)
      arch/s390/pci/pci.c:491:36:    expected void [noderef] <asn:2>*addr
      arch/s390/pci/pci.c:491:36:    got void *<noident>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      5b9f2081
    • S
      s390/pci: improve irq number check for msix · b19148f6
      Sebastian Ott 提交于
      s390s arch_setup_msi_irqs function ensures that we don't return with
      more irqs than the PCI architecture allows and that a single PCI
      function doesn't consume more irqs than the kernel is configured for.
      
      At least the last check doesn't help much and should take the sum of
      all irqs into account. Since that's already done by irq_alloc_desc
      we can remove this check.
      
      As for the first check we should use the value provided by the
      firmware which can be less than what the PCI architecture allows.
      Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b19148f6
    • M
      s390/cmpxchg: use compiler builtins · f318a122
      Martin Schwidefsky 提交于
      The kernel build for s390 fails for gcc compilers with version 3.x,
      set the minimum required version of gcc to version 4.3.
      
      As the atomic builtins are available with all gcc 4.x compilers,
      use the __sync_val_compare_and_swap and __sync_bool_compare_and_swap
      functions to replace the complex macro and inline assembler magic
      in include/asm/cmpxchg.h. The compiler can just-do-it and generates
      better code with the builtins.
      
      While we are at it use __sync_bool_compare_and_swap for the
      _raw_compare_and_swap function in the spinlock code as well.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      f318a122
  11. 28 10月, 2014 3 次提交
  12. 27 10月, 2014 5 次提交
    • M
      s390/mm: pmdp_get_and_clear_full optimization · fcbe08d6
      Martin Schwidefsky 提交于
      Analog to ptep_get_and_clear_full define a variant of the
      pmpd_get_and_clear primitive which gets the full hint from the
      mmu_gather struct. This allows s390 to avoid a costly instruction
      when destroying an address space.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fcbe08d6
    • H
      s390/ftrace,kprobes: allow to patch first instruction · c933146a
      Heiko Carstens 提交于
      If the function tracer is enabled, allow to set kprobes on the first
      instruction of a function (which is the function trace caller):
      
      If no kprobe is set handling of enabling and disabling function tracing
      of a function simply patches the first instruction. Either it is a nop
      (right now it's an unconditional branch, which skips the mcount block),
      or it's a branch to the ftrace_caller() function.
      
      If a kprobe is being placed on a function tracer calling instruction
      we encode if we actually have a nop or branch in the remaining bytes
      after the breakpoint instruction (illegal opcode).
      This is possible, since the size of the instruction used for the nop
      and branch is six bytes, while the size of the breakpoint is only
      two bytes.
      Therefore the first two bytes contain the illegal opcode and the last
      four bytes contain either "0" for nop or "1" for branch. The kprobes
      code will then execute/simulate the correct instruction.
      
      Instruction patching for kprobes and function tracer is always done
      with stop_machine(). Therefore we don't have any races where an
      instruction is patched concurrently on a different cpu.
      Besides that also the program check handler which executes the function
      trace caller instruction won't be executed concurrently to any
      stop_machine() execution.
      
      This allows to keep full fault based kprobes handling which generates
      correct pt_regs contents automatically.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c933146a
    • D
      s390/mm: disable KSM for storage key enabled pages · 3ac8e380
      Dominik Dingel 提交于
      When storage keys are enabled unmerge already merged pages and prevent
      new pages from being merged.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3ac8e380
    • D
      s390/mm: prevent and break zero page mappings in case of storage keys · 2faee8ff
      Dominik Dingel 提交于
      As soon as storage keys are enabled we need to stop working on zero page
      mappings to prevent inconsistencies between storage keys and pgste.
      
      Otherwise following data corruption could happen:
      1) guest enables storage key
      2) guest sets storage key for not mapped page X
         -> change goes to PGSTE
      3) guest reads from page X
         -> as X was not dirty before, the page will be zero page backed,
            storage key from PGSTE for X will go to storage key for zero page
      4) guest sets storage key for not mapped page Y (same logic as above
      5) guest reads from page Y
         -> as Y was not dirty before, the page will be zero page backed,
            storage key from PGSTE for Y will got to storage key for zero page
            overwriting storage key for X
      
      While holding the mmap sem, we are safe against changes on entries we
      already fixed, as every fault would need to take the mmap_sem (read).
      
      Other vCPUs executing storage key instructions will get a one time interception
      and be serialized also with mmap_sem.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      2faee8ff
    • D
      s390/mm: recfactor global pgste updates · a13cff31
      Dominik Dingel 提交于
      Replace the s390 specific page table walker for the pgste updates
      with a call to the common code walk_page_range function.
      There are now two pte modification functions, one for the reset
      of the CMMA state and another one for the initialization of the
      storage keys.
      Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a13cff31