1. 28 12月, 2017 3 次提交
  2. 27 12月, 2017 1 次提交
    • L
      rtnetlink: Replace implementation of ASSERT_RTNL() macro with WARN_ONCE() · 66364bdf
      Leon Romanovsky 提交于
      ASSERT_RTNL() macro is actual open-coded variant of WARN_ONCE() with
      two exceptions. First, it prints stack for multiple hits and not only
      once as WARN_ONCE() does. Second, the user can disable prints of
      WARN_ONCE by setting CONFIG_BUG to N.
      
      The multiple prints of dump stack are actually not needed, because calls
      without rtnl lock are programming errors and user can't do anything
      about them except to complain to the mailing list after first occurrence
      of such failure.
      
      The user who disabled BUG/WARN prints did it explicitly because by default
      in upstream kernel and distributions this option is enabled. It means
      that user doesn't want to see prints about missing locks too.
      
      This patch replaces open-coded variant in favor of already existing
      macro and change error prints to be once only.
      Reviewed-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      66364bdf
  3. 22 12月, 2017 11 次提交
  4. 21 12月, 2017 9 次提交
    • S
      xfrm: wrap xfrmdev_ops with offload config · 9cb0d21d
      Shannon Nelson 提交于
      There's no reason to define netdev->xfrmdev_ops if
      the offload facility is not CONFIG'd in.
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      9cb0d21d
    • S
      xfrm: check for xdo_dev_state_free · 7f05b467
      Shannon Nelson 提交于
      The current XFRM code assumes that we've implemented the
      xdo_dev_state_free() callback, even if it is meaningless to the driver.
      This patch adds a check for it before calling, as done in other APIs,
      to prevent a NULL function pointer kernel crash.
      Signed-off-by: NShannon Nelson <shannon.nelson@oracle.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      7f05b467
    • A
      bpf: fix integer overflows · bb7f0f98
      Alexei Starovoitov 提交于
      There were various issues related to the limited size of integers used in
      the verifier:
       - `off + size` overflow in __check_map_access()
       - `off + reg->off` overflow in check_mem_access()
       - `off + reg->var_off.value` overflow or 32-bit truncation of
         `reg->var_off.value` in check_mem_access()
       - 32-bit truncation in check_stack_boundary()
      
      Make sure that any integer math cannot overflow by not allowing
      pointer math with large values.
      
      Also reduce the scope of "scalar op scalar" tracking.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bb7f0f98
    • J
      block: unalign call_single_data in struct request · 4ccafe03
      Jens Axboe 提交于
      A previous change blindly added massive alignment to the
      call_single_data structure in struct request. This ballooned it in size
      from 296 to 320 bytes on my setup, for no valid reason at all.
      
      Use the unaligned struct __call_single_data variant instead.
      
      Fixes: 966a9671 ("smp: Avoid using two cache lines for struct call_single_data")
      Cc: stable@vger.kernel.org # v4.14
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4ccafe03
    • Y
      net: sock: replace sk_state_load with inet_sk_state_load and remove sk_state_store · 986ffdfd
      Yafang Shao 提交于
      sk_state_load is only used by AF_INET/AF_INET6, so rename it to
      inet_sk_state_load and move it into inet_sock.h.
      
      sk_state_store is removed as it is not used any more.
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      986ffdfd
    • Y
      net: tracepoint: replace tcp_set_state tracepoint with inet_sock_set_state tracepoint · 563e0bb0
      Yafang Shao 提交于
      As sk_state is a common field for struct sock, so the state
      transition tracepoint should not be a TCP specific feature.
      Currently it traces all AF_INET state transition, so I rename this
      tracepoint to inet_sock_set_state tracepoint with some minor changes and move it
      into trace/events/sock.h.
      We dont need to create a file named trace/events/inet_sock.h for this one single
      tracepoint.
      
      Two helpers are introduced to trace sk_state transition
          - void inet_sk_state_store(struct sock *sk, int newstate);
          - void inet_sk_set_state(struct sock *sk, int state);
      As trace header should not be included in other header files,
      so they are defined in sock.c.
      
      The protocol such as SCTP maybe compiled as a ko, hence export
      inet_sk_set_state().
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      563e0bb0
    • S
      tcp: Export to userspace the TCP state names for the trace events · d7b850a7
      Steven Rostedt (VMware) 提交于
      The TCP trace events (specifically tcp_set_state), maps emums to symbol
      names via __print_symbolic(). But this only works for reading trace events
      from the tracefs trace files. If perf or trace-cmd were to record these
      events, the event format file does not convert the enum names into numbers,
      and you get something like:
      
      __print_symbolic(REC->oldstate,
          { TCP_ESTABLISHED, "TCP_ESTABLISHED" },
          { TCP_SYN_SENT, "TCP_SYN_SENT" },
          { TCP_SYN_RECV, "TCP_SYN_RECV" },
          { TCP_FIN_WAIT1, "TCP_FIN_WAIT1" },
          { TCP_FIN_WAIT2, "TCP_FIN_WAIT2" },
          { TCP_TIME_WAIT, "TCP_TIME_WAIT" },
          { TCP_CLOSE, "TCP_CLOSE" },
          { TCP_CLOSE_WAIT, "TCP_CLOSE_WAIT" },
          { TCP_LAST_ACK, "TCP_LAST_ACK" },
          { TCP_LISTEN, "TCP_LISTEN" },
          { TCP_CLOSING, "TCP_CLOSING" },
          { TCP_NEW_SYN_RECV, "TCP_NEW_SYN_RECV" })
      
      Where trace-cmd and perf do not know the values of those enums.
      
      Use the TRACE_DEFINE_ENUM() macros that will have the trace events convert
      the enum strings into their values at system boot. This will allow perf and
      trace-cmd to see actual numbers and not enums:
      
      __print_symbolic(REC->oldstate,
          { 1, "TCP_ESTABLISHED" },
          { 2, "TCP_SYN_SENT" },
          { 3, "TCP_SYN_RECV" },
          { 4, "TCP_FIN_WAIT1" },
          { 5, "TCP_FIN_WAIT2" },
          { 6, "TCP_TIME_WAIT" },
          { 7, "TCP_CLOSE" },
          { 8, "TCP_CLOSE_WAIT" },
          { 9, "TCP_LAST_ACK" },
          { 10, "TCP_LISTEN" },
          { 11, "TCP_CLOSING" },
          { 12, "TCP_NEW_SYN_RECV" })
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NYafang Shao <laoar.shao@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7b850a7
    • S
      block-throttle: avoid double charge · 111be883
      Shaohua Li 提交于
      If a bio is throttled and split after throttling, the bio could be
      resubmited and enters the throttling again. This will cause part of the
      bio to be charged multiple times. If the cgroup has an IO limit, the
      double charge will significantly harm the performance. The bio split
      becomes quite common after arbitrary bio size change.
      
      To fix this, we always set the BIO_THROTTLED flag if a bio is throttled.
      If the bio is cloned/split, we copy the flag to new bio too to avoid a
      double charge. However, cloned bio could be directed to a new disk,
      keeping the flag be a problem. The observation is we always set new disk
      for the bio in this case, so we can clear the flag in bio_set_dev().
      
      This issue exists for a long time, arbitrary bio size change just makes
      it worse, so this should go into stable at least since v4.2.
      
      V1-> V2: Not add extra field in bio based on discussion with Tejun
      
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: stable@vger.kernel.org
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      111be883
    • J
      cls_bpf: fix offload assumptions after callback conversion · 102740bd
      Jakub Kicinski 提交于
      cls_bpf used to take care of tracking what offload state a filter
      is in, i.e. it would track if offload request succeeded or not.
      This information would then be used to issue correct requests to
      the driver, e.g. requests for statistics only on offloaded filters,
      removing only filters which were offloaded, using add instead of
      replace if previous filter was not added etc.
      
      This tracking of offload state no longer functions with the new
      callback infrastructure.  There could be multiple entities trying
      to offload the same filter.
      
      Throw out all the tracking and corresponding commands and simply
      pass to the drivers both old and new bpf program.  Drivers will
      have to deal with offload state tracking by themselves.
      
      Fixes: 3f7889c4 ("net: sched: cls_bpf: call block callbacks for offload")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      102740bd
  5. 20 12月, 2017 6 次提交
  6. 19 12月, 2017 7 次提交
  7. 18 12月, 2017 3 次提交
    • W
      KVM: Fix stack-out-of-bounds read in write_mmio · e39d200f
      Wanpeng Li 提交于
      Reported by syzkaller:
      
        BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
        Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298
      
        CPU: 6 PID: 32298 Comm: syz-executor Tainted: G           OE    4.15.0-rc2+ #18
        Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
        Call Trace:
         dump_stack+0xab/0xe1
         print_address_description+0x6b/0x290
         kasan_report+0x28a/0x370
         write_mmio+0x11e/0x270 [kvm]
         emulator_read_write_onepage+0x311/0x600 [kvm]
         emulator_read_write+0xef/0x240 [kvm]
         emulator_fix_hypercall+0x105/0x150 [kvm]
         em_hypercall+0x2b/0x80 [kvm]
         x86_emulate_insn+0x2b1/0x1640 [kvm]
         x86_emulate_instruction+0x39a/0xb90 [kvm]
         handle_exception+0x1b4/0x4d0 [kvm_intel]
         vcpu_enter_guest+0x15a0/0x2640 [kvm]
         kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
         kvm_vcpu_ioctl+0x479/0x880 [kvm]
         do_vfs_ioctl+0x142/0x9a0
         SyS_ioctl+0x74/0x80
         entry_SYSCALL_64_fastpath+0x23/0x9a
      
      The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
      to the guest memory, however, write_mmio tracepoint always prints 8 bytes
      through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
      leaks 5 bytes from the kernel stack (CVE-2017-17741).  This patch fixes
      it by just accessing the bytes which we operate on.
      
      Before patch:
      
      syz-executor-5567  [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f
      
      After patch:
      
      syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e39d200f
    • M
      KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC · f384dcfe
      Marc Zyngier 提交于
      If we don't have a usable GIC, do not try to set the vcpu affinity
      as this is guaranteed to fail.
      Reported-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NAndre Przywara <andre.przywara@arm.com>
      Tested-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      f384dcfe
    • A
      bpf: x64: add JIT support for multi-function programs · 1c2a088a
      Alexei Starovoitov 提交于
      Typical JIT does several passes over bpf instructions to
      compute total size and relative offsets of jumps and calls.
      With multitple bpf functions calling each other all relative calls
      will have invalid offsets intially therefore we need to additional
      last pass over the program to emit calls with correct offsets.
      For example in case of three bpf functions:
      main:
        call foo
        call bpf_map_lookup
        exit
      foo:
        call bar
        exit
      bar:
        exit
      
      We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
      x64 JIT typically does 4-5 passes to converge.
      After these initial passes the image for these 3 functions
      will be good except call targets, since start addresses of
      foo() and bar() are unknown when we were JITing main()
      (note that call bpf_map_lookup will be resolved properly
      during initial passes).
      Once start addresses of 3 functions are known we patch
      call_insn->imm to point to right functions and call
      bpf_int_jit_compile() again which needs only one pass.
      Additional safety checks are done to make sure this
      last pass doesn't produce image that is larger or smaller
      than previous pass.
      
      When constant blinding is on it's applied to all functions
      at the first pass, since doing it once again at the last
      pass can change size of the JITed code.
      
      Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
      x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
      All other JITs that support normal BPF_CALL will behave the same way
      since bpf-to-bpf call is equivalent to bpf-to-kernel call from
      JITs point of view.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1c2a088a