1. 10 9月, 2014 15 次提交
    • E
      tcp: remove dst refcount false sharing for prequeue mode · ca777eff
      Eric Dumazet 提交于
      Alexander Duyck reported high false sharing on dst refcount in tcp stack
      when prequeue is used. prequeue is the mechanism used when a thread is
      blocked in recvmsg()/read() on a TCP socket, using a blocking model
      rather than select()/poll()/epoll() non blocking one.
      
      We already try to use RCU in input path as much as possible, but we were
      forced to take a refcount on the dst when skb escaped RCU protected
      region. When/if the user thread runs on different cpu, dst_release()
      will then touch dst refcount again.
      
      Commit 09316255 (tcp: force a dst refcount when prequeue packet)
      was an example of a race fix.
      
      It turns out the only remaining usage of skb->dst for a packet stored
      in a TCP socket prequeue is IP early demux.
      
      We can add a logic to detect when IP early demux is probably going
      to use skb->dst. Because we do an optimistic check rather than duplicate
      existing logic, we need to guard inet_sk_rx_dst_set() and
      inet6_sk_rx_dst_set() from using a NULL dst.
      
      Many thanks to Alexander for providing a nice bug report, git bisection,
      and reproducer.
      
      Tested using Alexander script on a 40Gb NIC, 8 RX queues.
      Hosts have 24 cores, 48 hyper threads.
      
      echo 0 >/proc/sys/net/ipv4/tcp_autocorking
      
      for i in `seq 0 47`
      do
        for j in `seq 0 2`
        do
           netperf -H $DEST -t TCP_STREAM -l 1000 \
                   -c -C -T $i,$i -P 0 -- \
                   -m 64 -s 64K -D &
        done
      done
      
      Before patch : ~6Mpps and ~95% cpu usage on receiver
      After patch : ~9Mpps and ~35% cpu usage on receiver.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca777eff
    • S
      ath5k: Add missing vmalloc.h include. · 32bc6d1a
      Stephen Rothwell 提交于
      After merging the wireless-next tree, today's linux-next build (powerpc
      allyesconfig) failed like this:
      
      drivers/net/wireless/ath/ath5k/debug.c: In function 'open_file_eeprom':
      drivers/net/wireless/ath/ath5k/debug.c:933:2: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration]
        buf = vmalloc(eesize);
        ^
      drivers/net/wireless/ath/ath5k/debug.c:933:6: warning: assignment makes pointer from integer without a cast
        buf = vmalloc(eesize);
            ^
      drivers/net/wireless/ath/ath5k/debug.c:960:2: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
        vfree(buf);
        ^
      
      Caused by commit db906eb2 ("ath5k: added debugfs file for dumping
      eeprom").  Also reported by Guenter Roeck.
      
      I have used Geert Uytterhoeven's suggested fix of including vmalloc.h
      and so added this patch for today:
      
      From: Stephen Rothwell <sfr@canb.auug.org.au>
      Date: Mon, 8 Sep 2014 18:39:23 +1000
      Subject: [PATCH] ath5k: fix debugfs addition
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Suggested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32bc6d1a
    • V
      ethernet: ti: remove unwanted THIS_MODULE macro · c9104b04
      Varka Bhadram 提交于
      It removes the owner field updation of driver structure.
      It will be automatically updated by module_platform_driver()
      Signed-off-by: NVarka Bhadram <varkab@cdac.in>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9104b04
    • L
      openvswitch: change the data type of error status to atomic_long_t · e403aded
      Li RongQing 提交于
      Change the date type of error status from u64 to atomic_long_t, and use atomic
      operation, then remove the lock which is used to protect the error status.
      
      The operation of atomic maybe faster than spin lock.
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e403aded
    • R
      bridge: Cleanup of unncessary check. · 5aaa62d6
      Rami Rosen 提交于
      This patch removes an unncessary check in the br_afspec() method of
      br_netlink.c.
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aaa62d6
    • D
      Merge branch 'bridge_rtnl_link' · 8b86f7f3
      David S. Miller 提交于
      Jiri Pirko says:
      
      ====================
      bridge: implement rtnl_link options for getting and setting bridge options
      
      So far, only sysfs is complete interface for getting and setting bridge
      options. This patchset follows-up on the similar bonding code and
      allows userspace to get/set bridge master/port options using Netlink
      IFLA_INFO_DATA/IFLA_INFO_SLAVE_DATA attr.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b86f7f3
    • J
      bridge: implement rtnl_link_ops->changelink · 13323516
      Jiri Pirko 提交于
      Allow rtnetlink users to set bridge master info via IFLA_INFO_DATA attr
      This initial part implements forward_delay, hello_time, max_age options.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13323516
    • J
      bridge: implement rtnl_link_ops->get_size and rtnl_link_ops->fill_info · e5c3ea5c
      Jiri Pirko 提交于
      Allow rtnetlink users to get bridge master info in IFLA_INFO_DATA attr
      This initial part implements forward_delay, hello_time, max_age options.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5c3ea5c
    • J
      bridge: implement rtnl_link_ops->slave_changelink · 3ac636b8
      Jiri Pirko 提交于
      Allow rtnetlink users to set port info via IFLA_INFO_SLAVE_DATA attr
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ac636b8
    • J
      bridge: implement rtnl_link_ops->get_slave_size and rtnl_link_ops->fill_slave_info · ced8283f
      Jiri Pirko 提交于
      Allow rtnetlink users to get port info in IFLA_INFO_SLAVE_DATA attr
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ced8283f
    • J
      bridge: switch order of rx_handler reg and upper dev link · 0f49579a
      Jiri Pirko 提交于
      The thing is that netdev_master_upper_dev_link calls
      call_netdevice_notifiers(NETDEV_CHANGEUPPER, dev). That generates rtnl
      link message and during that, rtnl_link_ops->fill_slave_info is called.
      But with current ordering, rx_handler and IFF_BRIDGE_PORT are not set
      yet so there would have to be check for that in fill_slave_info callback.
      
      Resolve this by reordering to similar what bonding and team does to
      avoid the check.
      
      Also add removal of IFF_BRIDGE_PORT flag into error path.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f49579a
    • V
      net/ipv4: bind ip_nonlocal_bind to current netns · 49a60158
      Vincent Bernat 提交于
      net.ipv4.ip_nonlocal_bind sysctl was global to all network
      namespaces. This patch allows to set a different value for each
      network namespace.
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49a60158
    • D
      Merge branch 'ebpf' · afddacc3
      David S. Miller 提交于
      Alexei Starovoitov says:
      
      ====================
      load imm64 insn and uapi/linux/bpf.h
      
      V9->V10
      - no changes, added Daniel's ack
      
      Note they're on top of Hannes's patch in the same area [1]
      
      V8 thread with 'why' reasoning and end goal [2]
      
      Original set [3] of ~28 patches I'm planning to present in 4 stages:
      
        I. this 2 patches to fork off llvm upstreaming
       II. bpf syscall with manpage and map implementation
      III. bpf program load/unload with verifier testsuite (1st user of
           instruction macros from bpf.h and 1st user of load imm64 insn)
       IV. tracing, etc
      
      [1] http://patchwork.ozlabs.org/patch/385266/
      [2] https://lkml.org/lkml/2014/8/27/628
      [3] https://lkml.org/lkml/2014/8/26/859
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afddacc3
    • A
      net: filter: split filter.h and expose eBPF to user space · daedfb22
      Alexei Starovoitov 提交于
      allow user space to generate eBPF programs
      
      uapi/linux/bpf.h: eBPF instruction set definition
      
      linux/filter.h: the rest
      
      This patch only moves macro definitions, but practically it freezes existing
      eBPF instruction set, though new instructions can still be added in the future.
      
      These eBPF definitions cannot go into uapi/linux/filter.h, since the names
      may conflict with existing applications.
      
      Full eBPF ISA description is in Documentation/networking/filter.txt
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      daedfb22
    • A
      net: filter: add "load 64-bit immediate" eBPF instruction · 02ab695b
      Alexei Starovoitov 提交于
      add BPF_LD_IMM64 instruction to load 64-bit immediate value into a register.
      All previous instructions were 8-byte. This is first 16-byte instruction.
      Two consecutive 'struct bpf_insn' blocks are interpreted as single instruction:
      insn[0].code = BPF_LD | BPF_DW | BPF_IMM
      insn[0].dst_reg = destination register
      insn[0].imm = lower 32-bit
      insn[1].code = 0
      insn[1].imm = upper 32-bit
      All unused fields must be zero.
      
      Classic BPF has similar instruction: BPF_LD | BPF_W | BPF_IMM
      which loads 32-bit immediate value into a register.
      
      x64 JITs it as single 'movabsq %rax, imm64'
      arm64 may JIT as sequence of four 'movk x0, #imm16, lsl #shift' insn
      
      Note that old eBPF programs are binary compatible with new interpreter.
      
      It helps eBPF programs load 64-bit constant into a register with one
      instruction instead of using two registers and 4 instructions:
      BPF_MOV32_IMM(R1, imm32)
      BPF_ALU64_IMM(BPF_LSH, R1, 32)
      BPF_MOV32_IMM(R2, imm32)
      BPF_ALU64_REG(BPF_OR, R1, R2)
      
      User space generated programs will use this instruction to load constants only.
      
      To tell kernel that user space needs a pointer the _pseudo_ variant of
      this instruction may be added later, which will use extra bits of encoding
      to indicate what type of pointer user space is asking kernel to provide.
      For example 'off' or 'src_reg' fields can be used for such purpose.
      src_reg = 1 could mean that user space is asking kernel to validate and
      load in-kernel map pointer.
      src_reg = 2 could mean that user space needs readonly data section pointer
      src_reg = 3 could mean that user space needs a pointer to per-cpu local data
      All such future pseudo instructions will not be carrying the actual pointer
      as part of the instruction, but rather will be treated as a request to kernel
      to provide one. The kernel will verify the request_for_a_pointer, then
      will drop _pseudo_ marking and will store actual internal pointer inside
      the instruction, so the end result is the interpreter and JITs never
      see pseudo BPF_LD_IMM64 insns and only operate on generic BPF_LD_IMM64 that
      loads 64-bit immediate into a register. User space never operates on direct
      pointers and verifier can easily recognize request_for_pointer vs other
      instructions.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02ab695b
  2. 09 9月, 2014 6 次提交
  3. 08 9月, 2014 19 次提交