1. 06 12月, 2014 1 次提交
    • A
      net: sock: allow eBPF programs to be attached to sockets · 89aa0758
      Alexei Starovoitov 提交于
      introduce new setsockopt() command:
      
      setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))
      
      where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
      and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER
      
      setsockopt() calls bpf_prog_get() which increments refcnt of the program,
      so it doesn't get unloaded while socket is using the program.
      
      The same eBPF program can be attached to multiple sockets.
      
      User task exit automatically closes socket which calls sk_filter_uncharge()
      which decrements refcnt of eBPF program
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aa0758
  2. 12 11月, 2014 1 次提交
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
  3. 22 10月, 2014 1 次提交
    • P
      powerpc: Wire up sys_bpf() syscall · fcbb539f
      Pranith Kumar 提交于
      This patch wires up the new syscall sys_bpf() on powerpc.
      
      Passes the tests in samples/bpf:
      
          #0 add+sub+mul OK
          #1 unreachable OK
          #2 unreachable2 OK
          #3 out of range jump OK
          #4 out of range jump2 OK
          #5 test1 ld_imm64 OK
          #6 test2 ld_imm64 OK
          #7 test3 ld_imm64 OK
          #8 test4 ld_imm64 OK
          #9 test5 ld_imm64 OK
          #10 no bpf_exit OK
          #11 loop (back-edge) OK
          #12 loop2 (back-edge) OK
          #13 conditional loop OK
          #14 read uninitialized register OK
          #15 read invalid register OK
          #16 program doesn't init R0 before exit OK
          #17 stack out of bounds OK
          #18 invalid call insn1 OK
          #19 invalid call insn2 OK
          #20 invalid function call OK
          #21 uninitialized stack1 OK
          #22 uninitialized stack2 OK
          #23 check valid spill/fill OK
          #24 check corrupted spill/fill OK
          #25 invalid src register in STX OK
          #26 invalid dst register in STX OK
          #27 invalid dst register in ST OK
          #28 invalid src register in LDX OK
          #29 invalid dst register in LDX OK
          #30 junk insn OK
          #31 junk insn2 OK
          #32 junk insn3 OK
          #33 junk insn4 OK
          #34 junk insn5 OK
          #35 misaligned read from stack OK
          #36 invalid map_fd for function call OK
          #37 don't check return value before access OK
          #38 access memory with incorrect alignment OK
          #39 sometimes access memory with incorrect alignment OK
          #40 jump test 1 OK
          #41 jump test 2 OK
          #42 jump test 3 OK
          #43 jump test 4 OK
      Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
      [mpe: test using samples/bpf]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      fcbb539f
  4. 22 9月, 2014 2 次提交
  5. 09 9月, 2014 1 次提交
  6. 28 7月, 2014 2 次提交
  7. 11 6月, 2014 1 次提交
  8. 02 6月, 2014 1 次提交
  9. 30 5月, 2014 2 次提交
  10. 20 5月, 2014 1 次提交
    • J
      powerpc: Remove non-uapi linkage.h export · dade934a
      James Hogan 提交于
      The arch/powerpc/include/asm/linkage.h is being unintentionally exported
      in the kernel headers since commit e1b5bb6d (consolidate
      cond_syscall and SYSCALL_ALIAS declarations) when
      arch/powerpc/include/uapi/asm/linkage.h was deleted but the header-y not
      removed from the Kbuild file. This happens because Makefile.headersinst
      still checks the old asm/ directory if the specified header doesn't
      exist in the uapi directory.
      
      The asm/linkage.h shouldn't ever have been exported anyway. No other
      arch does and it doesn't contain anything useful to userland, so remove
      the header-y line from the Kbuild file which triggers the export.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dade934a
  11. 28 4月, 2014 1 次提交
  12. 23 4月, 2014 2 次提交
  13. 29 1月, 2014 1 次提交
  14. 27 1月, 2014 3 次提交
    • M
      KVM: PPC: Book3S HV: Add software abort codes for transactional memory · b17dfec0
      Michael Neuling 提交于
      This adds the software abort code defines for transactional memory (TM).
      These values are from PAPR.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b17dfec0
    • P
      KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 · 8563bf52
      Paul Mackerras 提交于
      The DABRX (DABR extension) register on POWER7 processors provides finer
      control over which accesses cause a data breakpoint interrupt.  It
      contains 3 bits which indicate whether to enable accesses in user,
      kernel and hypervisor modes respectively to cause data breakpoint
      interrupts, plus one bit that enables both real mode and virtual mode
      accesses to cause interrupts.  Currently, KVM sets DABRX to allow
      both kernel and user accesses to cause interrupts while in the guest.
      
      This adds support for the guest to specify other values for DABRX.
      PAPR defines a H_SET_XDABR hcall to allow the guest to set both DABR
      and DABRX with one call.  This adds a real-mode implementation of
      H_SET_XDABR, which shares most of its code with the existing H_SET_DABR
      implementation.  To support this, we add a per-vcpu field to store the
      DABRX value plus code to get and set it via the ONE_REG interface.
      
      For Linux guests to use this new hcall, userspace needs to add
      "hcall-xdabr" to the set of strings in the /chosen/hypertas-functions
      property in the device tree.  If userspace does this and then migrates
      the guest to a host where the kernel doesn't include this patch, then
      userspace will need to implement H_SET_XDABR by writing the specified
      DABR value to the DABR using the ONE_REG interface.  In that case, the
      old kernel will set DABRX to DABRX_USER | DABRX_KERNEL.  That should
      still work correctly, at least for Linux guests, since Linux guests
      cope with getting data breakpoint interrupts in modes that weren't
      requested by just ignoring the interrupt, and Linux guests never set
      DABRX_BTI.
      
      The other thing this does is to make H_SET_DABR and H_SET_XDABR work
      on POWER8, which has the DAWR and DAWRX instead of DABR/X.  Guests that
      know about POWER8 should use H_SET_MODE rather than H_SET_[X]DABR, but
      guests running in POWER7 compatibility mode will still use H_SET_[X]DABR.
      For them, this adds the logic to convert DABR/X values into DAWR/X values
      on POWER8.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      8563bf52
    • M
      KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs · b005255e
      Michael Neuling 提交于
      This adds fields to the struct kvm_vcpu_arch to store the new
      guest-accessible SPRs on POWER8, adds code to the get/set_one_reg
      functions to allow userspace to access this state, and adds code to
      the guest entry and exit to context-switch these SPRs between host
      and guest.
      
      Note that DPDES (Directed Privileged Doorbell Exception State) is
      shared between threads on a core; hence we store it in struct
      kvmppc_vcore and have the master thread save and restore it.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b005255e
  15. 19 1月, 2014 1 次提交
    • M
      net: introduce SO_BPF_EXTENSIONS · ea02f941
      Michal Sekletar 提交于
      For user space packet capturing libraries such as libpcap, there's
      currently only one way to check which BPF extensions are supported
      by the kernel, that is, commit aa1113d9 ("net: filter: return
      -EINVAL if BPF_S_ANC* operation is not supported"). For querying all
      extensions at once this might be rather inconvenient.
      
      Therefore, this patch introduces a new option which can be used as
      an argument for getsockopt(), and allows one to obtain information
      about which BPF extensions are supported by the current kernel.
      
      As David Miller suggests, we do not need to define any bits right
      now and status quo can just return 0 in order to state that this
      versions supports SKF_AD_PROTOCOL up to SKF_AD_PAY_OFFSET. Later
      additions to BPF extensions need to add their bits to the
      bpf_tell_extensions() function, as documented in the comment.
      Signed-off-by: NMichal Sekletar <msekleta@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Reviewed-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea02f941
  16. 17 10月, 2013 8 次提交
    • B
      KVM: PPC: E500: Add userspace debug stub support · ce11e48b
      Bharat Bhushan 提交于
      This patch adds the debug stub support on booke/bookehv.
      Now QEMU debug stub can use hw breakpoint, watchpoint and
      software breakpoint to debug guest.
      
      This is how we save/restore debug register context when switching
      between guest, userspace and kernel user-process:
      
      When QEMU is running
       -> thread->debug_reg == QEMU debug register context.
       -> Kernel will handle switching the debug register on context switch.
       -> no vcpu_load() called
      
      QEMU makes ioctls (except RUN)
       -> This will call vcpu_load()
       -> should not change context.
       -> Some ioctls can change vcpu debug register, context saved in vcpu->debug_regs
      
      QEMU Makes RUN ioctl
       -> Save thread->debug_reg on STACK
       -> Store thread->debug_reg == vcpu->debug_reg
       -> load thread->debug_reg
       -> RUN VCPU ( So thread points to vcpu context )
      
      Context switch happens When VCPU running
       -> makes vcpu_load() should not load any context
       -> kernel loads the vcpu context as thread->debug_regs points to vcpu context.
      
      On heavyweight_exit
       -> Load the context saved on stack in thread->debug_reg
      
      Currently we do not support debug resource emulation to guest,
      On debug exception, always exit to user space irrespective of
      user space is expecting the debug exception or not. If this is
      unexpected exception (breakpoint/watchpoint event not set by
      userspace) then let us leave the action on user space. This
      is similar to what it was before, only thing is that now we
      have proper exit state available to user space.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      ce11e48b
    • B
      KVM: PPC: E500: exit to user space on "ehpriv 1" instruction · b12c7841
      Bharat Bhushan 提交于
      "ehpriv 1" instruction is used for setting software breakpoints
      by user space. This patch adds support to exit to user space
      with "run->debug" have relevant information.
      
      As this is the first point we are using run->debug, also defined
      the run->debug structure.
      Signed-off-by: NBharat Bhushan <bharat.bhushan@freescale.com>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      b12c7841
    • P
      KVM: PPC: Book3S HV: Support POWER6 compatibility mode on POWER7 · 388cc6e1
      Paul Mackerras 提交于
      This enables us to use the Processor Compatibility Register (PCR) on
      POWER7 to put the processor into architecture 2.05 compatibility mode
      when running a guest.  In this mode the new instructions and registers
      that were introduced on POWER7 are disabled in user mode.  This
      includes all the VSX facilities plus several other instructions such
      as ldbrx, stdbrx, popcntw, popcntd, etc.
      
      To select this mode, we have a new register accessible through the
      set/get_one_reg interface, called KVM_REG_PPC_ARCH_COMPAT.  Setting
      this to zero gives the full set of capabilities of the processor.
      Setting it to one of the "logical" PVR values defined in PAPR puts
      the vcpu into the compatibility mode for the corresponding
      architecture level.  The supported values are:
      
      0x0f000002	Architecture 2.05 (POWER6)
      0x0f000003	Architecture 2.06 (POWER7)
      0x0f100003	Architecture 2.06+ (POWER7+)
      
      Since the PCR is per-core, the architecture compatibility level and
      the corresponding PCR value are stored in the struct kvmppc_vcore, and
      are therefore shared between all vcpus in a virtual core.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: squash in fix to add missing break statements and documentation]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      388cc6e1
    • P
      KVM: PPC: Book3S HV: Add support for guest Program Priority Register · 4b8473c9
      Paul Mackerras 提交于
      POWER7 and later IBM server processors have a register called the
      Program Priority Register (PPR), which controls the priority of
      each hardware CPU SMT thread, and affects how fast it runs compared
      to other SMT threads.  This priority can be controlled by writing to
      the PPR or by use of a set of instructions of the form or rN,rN,rN
      which are otherwise no-ops but have been defined to set the priority
      to particular levels.
      
      This adds code to context switch the PPR when entering and exiting
      guests and to make the PPR value accessible through the SET/GET_ONE_REG
      interface.  When entering the guest, we set the PPR as late as
      possible, because if we are setting a low thread priority it will
      make the code run slowly from that point on.  Similarly, the
      first-level interrupt handlers save the PPR value in the PACA very
      early on, and set the thread priority to the medium level, so that
      the interrupt handling code runs at a reasonable speed.
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      4b8473c9
    • P
      KVM: PPC: Book3S HV: Store LPCR value for each virtual core · a0144e2a
      Paul Mackerras 提交于
      This adds the ability to have a separate LPCR (Logical Partitioning
      Control Register) value relating to a guest for each virtual core,
      rather than only having a single value for the whole VM.  This
      corresponds to what real POWER hardware does, where there is a LPCR
      per CPU thread but most of the fields are required to have the same
      value on all active threads in a core.
      
      The per-virtual-core LPCR can be read and written using the
      GET/SET_ONE_REG interface.  Userspace can can only modify the
      following fields of the LPCR value:
      
      DPFD	Default prefetch depth
      ILE	Interrupt little-endian
      TC	Translation control (secondary HPT hash group search disable)
      
      We still maintain a per-VM default LPCR value in kvm->arch.lpcr, which
      contains bits relating to memory management, i.e. the Virtualized
      Partition Memory (VPM) bits and the bits relating to guest real mode.
      When this default value is updated, the update needs to be propagated
      to the per-vcore values, so we add a kvmppc_update_lpcr() helper to do
      that.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      [agraf: fix whitespace]
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      a0144e2a
    • P
      KVM: PPC: Book3S: Add GET/SET_ONE_REG interface for VRSAVE · c0867fd5
      Paul Mackerras 提交于
      The VRSAVE register value for a vcpu is accessible through the
      GET/SET_SREGS interface for Book E processors, but not for Book 3S
      processors.  In order to make this accessible for Book 3S processors,
      this adds a new register identifier for GET/SET_ONE_REG, and adds
      the code to implement it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      c0867fd5
    • P
      KVM: PPC: Book3S HV: Implement timebase offset for guests · 93b0f4dc
      Paul Mackerras 提交于
      This allows guests to have a different timebase origin from the host.
      This is needed for migration, where a guest can migrate from one host
      to another and the two hosts might have a different timebase origin.
      However, the timebase seen by the guest must not go backwards, and
      should go forwards only by a small amount corresponding to the time
      taken for the migration.
      
      Therefore this provides a new per-vcpu value accessed via the one_reg
      interface using the new KVM_REG_PPC_TB_OFFSET identifier.  This value
      defaults to 0 and is not modified by KVM.  On entering the guest, this
      value is added onto the timebase, and on exiting the guest, it is
      subtracted from the timebase.
      
      This is only supported for recent POWER hardware which has the TBU40
      (timebase upper 40 bits) register.  Writing to the TBU40 register only
      alters the upper 40 bits of the timebase, leaving the lower 24 bits
      unchanged.  This provides a way to modify the timebase for guest
      migration without disturbing the synchronization of the timebase
      registers across CPU cores.  The kernel rounds up the value given
      to a multiple of 2^24.
      
      Timebase values stored in KVM structures (struct kvm_vcpu, struct
      kvmppc_vcore, etc.) are stored as host timebase values.  The timebase
      values in the dispatch trace log need to be guest timebase values,
      however, since that is read directly by the guest.  This moves the
      setting of vcpu->arch.dec_expires on guest exit to a point after we
      have restored the host timebase so that vcpu->arch.dec_expires is a
      host timebase value.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      93b0f4dc
    • M
      KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg · 3b783474
      Michael Neuling 提交于
      This reserves space in get/set_one_reg ioctl for the extra guest state
      needed for POWER8.  It doesn't implement these at all, it just reserves
      them so that the ABI is defined now.
      
      A few things to note here:
      
      - This add *a lot* state for transactional memory.  TM suspend mode,
        this is unavoidable, you can't simply roll back all transactions and
        store only the checkpointed state.  I've added this all to
        get/set_one_reg (including GPRs) rather than creating a new ioctl
        which returns a struct kvm_regs like KVM_GET_REGS does.  This means we
        if we need to extract the TM state, we are going to need a bucket load
        of IOCTLs.  Hopefully most of the time this will not be needed as we
        can look at the MSR to see if TM is active and only grab them when
        needed.  If this becomes a bottle neck in future we can add another
        ioctl to grab all this state in one go.
      
      - The TM state is offset by 0x80000000.
      
      - For TM, I've done away with VMX and FP and created a single 64x128 bit
        VSX register space.
      
      - I've left a space of 1 (at 0x9c) since Paulus needs to add a value
        which applies to POWER7 as well.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      3b783474
  17. 11 10月, 2013 1 次提交
  18. 29 9月, 2013 1 次提交
    • E
      net: introduce SO_MAX_PACING_RATE · 62748f32
      Eric Dumazet 提交于
      As mentioned in commit afe4fd06 ("pkt_sched: fq: Fair Queue packet
      scheduler"), this patch adds a new socket option.
      
      SO_MAX_PACING_RATE offers the application the ability to cap the
      rate computed by transport layer. Value is in bytes per second.
      
      u32 val = 1000000;
      setsockopt(sockfd, SOL_SOCKET, SO_MAX_PACING_RATE, &val, sizeof(val));
      
      To be effectively paced, a flow must use FQ packet scheduler.
      
      Note that a packet scheduler takes into account the headers for its
      computations. The effective payload rate depends on MSS and retransmits
      if any.
      
      I chose to make this pacing rate a SOL_SOCKET option instead of a
      TCP one because this can be used by other protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Steinar H. Gunderson <sesse@google.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62748f32
  19. 14 8月, 2013 2 次提交
  20. 01 8月, 2013 1 次提交
  21. 11 7月, 2013 1 次提交
  22. 18 6月, 2013 1 次提交
  23. 01 6月, 2013 1 次提交
  24. 06 5月, 2013 1 次提交
  25. 02 5月, 2013 1 次提交
    • P
      KVM: PPC: Book3S: Add API for in-kernel XICS emulation · 5975a2e0
      Paul Mackerras 提交于
      This adds the API for userspace to instantiate an XICS device in a VM
      and connect VCPUs to it.  The API consists of a new device type for
      the KVM_CREATE_DEVICE ioctl, a new capability KVM_CAP_IRQ_XICS, which
      functions similarly to KVM_CAP_IRQ_MPIC, and the KVM_IRQ_LINE ioctl,
      which is used to assert and deassert interrupt inputs of the XICS.
      
      The XICS device has one attribute group, KVM_DEV_XICS_GRP_SOURCES.
      Each attribute within this group corresponds to the state of one
      interrupt source.  The attribute number is the same as the interrupt
      source number.
      
      This does not support irq routing or irqfd yet.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      5975a2e0
  26. 27 4月, 2013 1 次提交