1. 23 5月, 2018 5 次提交
    • M
      bpf: btf: Rename btf_key_id and btf_value_id in bpf_map_info · 9b2cf328
      Martin KaFai Lau 提交于
      In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id"
      could cause confusion because the "id" of "btf_id" means the BPF obj id
      given to the BTF object while
      "btf_key_id" and "btf_value_id" means the BTF type id within
      that BTF object.
      
      To make it clear, btf_key_id and btf_value_id are
      renamed to btf_key_type_id and btf_value_type_id.
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9b2cf328
    • M
      bpf: btf: Remove unused bits from uapi/linux/btf.h · aea2f7b8
      Martin KaFai Lau 提交于
      This patch does the followings:
      1. Limit BTF_MAX_TYPES and BTF_MAX_NAME_OFFSET to 64k.  We can
         raise it later.
      
      2. Remove the BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID.  They are
         currently encoded at the highest bit of a u32.
         It is because the current use case does not require supporting
         parent type (i.e type_id referring to a type in another BTF file).
         It also does not support referring to a string in ELF.
      
         The BTF_TYPE_PARENT and BTF_STR_TBL_ELF_ID checks are replaced
         by BTF_TYPE_ID_CHECK and BTF_STR_OFFSET_CHECK which are
         defined in btf.c instead of uapi/linux/btf.h.
      
      3. Limit the BTF_INFO_KIND from 5 bits to 4 bits which is enough.
         There is unused bits headroom if we ever needed it later.
      
      4. The root bit in BTF_INFO is also removed because it is not
         used in the current use case.
      
      5. Remove BTF_INT_VARARGS since func type is not supported now.
         The BTF_INT_ENCODING is limited to 4 bits instead of 8 bits.
      
      The above can be added back later because the verifier
      ensures the unused bits are zeros.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      aea2f7b8
    • M
      bpf: btf: Check array->index_type · 4ef5f574
      Martin KaFai Lau 提交于
      Instead of ingoring the array->index_type field.  Enforce that
      it must be a BTF_KIND_INT in size 1/2/4/8 bytes.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4ef5f574
    • M
      bpf: btf: Change how section is supported in btf_header · f80442a4
      Martin KaFai Lau 提交于
      There are currently unused section descriptions in the btf_header.  Those
      sections are here to support future BTF use cases.  For example, the
      func section (func_off) is to support function signature (e.g. the BPF
      prog function signature).
      
      Instead of spelling out all potential sections up-front in the btf_header.
      This patch makes changes to btf_header such that extending it (e.g. adding
      a section) is possible later.  The unused ones can be removed for now and
      they can be added back later.
      
      This patch:
      1. adds a hdr_len to the btf_header.  It will allow adding
      sections (and other info like parent_label and parent_name)
      later.  The check is similar to the existing bpf_attr.
      If a user passes in a longer hdr_len, the kernel
      ensures the extra tailing bytes are 0.
      
      2. allows the section order in the BTF object to be
      different from its sec_off order in btf_header.
      
      3. each sec_off is followed by a sec_len.  It must not have gap or
      overlapping among sections.
      
      The string section is ensured to be at the end due to the 4 bytes
      alignment requirement of the type section.
      
      The above changes will allow enough flexibility to
      add new sections (and other info) to the btf_header later.
      
      This patch also removes an unnecessary !err check
      at the end of btf_parse().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f80442a4
    • M
      bpf: Expose check_uarg_tail_zero() · dcab51f1
      Martin KaFai Lau 提交于
      This patch exposes check_uarg_tail_zero() which will
      be reused by a later BTF patch.  Its name is changed to
      bpf_check_uarg_tail_zero().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      dcab51f1
  2. 19 5月, 2018 1 次提交
  3. 18 5月, 2018 3 次提交
  4. 17 5月, 2018 2 次提交
    • J
      bpf: sockmap, on update propagate errors back to userspace · e23afe5e
      John Fastabend 提交于
      When an error happens in the update sockmap element logic also pass
      the err up to the user.
      
      Fixes: e5cd3abc ("bpf: sockmap, refactor sockmap routines to work with hashmap")
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e23afe5e
    • Y
      bpf: fix sock hashmap kmalloc warning · 683d2ac3
      Yonghong Song 提交于
      syzbot reported a kernel warning below:
        WARNING: CPU: 0 PID: 4499 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
        Kernel panic - not syncing: panic_on_warn set ...
      
        CPU: 0 PID: 4499 Comm: syz-executor050 Not tainted 4.17.0-rc3+ #9
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x1b9/0x294 lib/dump_stack.c:113
         panic+0x22f/0x4de kernel/panic.c:184
         __warn.cold.8+0x163/0x1b3 kernel/panic.c:536
         report_bug+0x252/0x2d0 lib/bug.c:186
         fixup_bug arch/x86/kernel/traps.c:178 [inline]
         do_error_trap+0x1de/0x490 arch/x86/kernel/traps.c:296
         do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
         invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:992
        RIP: 0010:kmalloc_slab+0x56/0x70 mm/slab_common.c:996
        RSP: 0018:ffff8801d907fc58 EFLAGS: 00010246
        RAX: 0000000000000000 RBX: ffff8801aeecb280 RCX: ffffffff8185ebd7
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000ffffffe1
        RBP: ffff8801d907fc58 R08: ffff8801adb5e1c0 R09: ffffed0035a84700
        R10: ffffed0035a84700 R11: ffff8801ad423803 R12: ffff8801aeecb280
        R13: 00000000fffffff4 R14: ffff8801ad891a00 R15: 00000000014200c0
         __do_kmalloc mm/slab.c:3713 [inline]
         __kmalloc+0x25/0x760 mm/slab.c:3727
         kmalloc include/linux/slab.h:517 [inline]
         map_get_next_key+0x24a/0x640 kernel/bpf/syscall.c:858
         __do_sys_bpf kernel/bpf/syscall.c:2131 [inline]
         __se_sys_bpf kernel/bpf/syscall.c:2096 [inline]
         __x64_sys_bpf+0x354/0x4f0 kernel/bpf/syscall.c:2096
         do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The test case is against sock hashmap with a key size 0xffffffe1.
      Such a large key size will cause the below code in function
      sock_hash_alloc() overflowing and produces a smaller elem_size,
      hence map creation will be successful.
          htab->elem_size = sizeof(struct htab_elem) +
                            round_up(htab->map.key_size, 8);
      
      Later, when map_get_next_key is called and kernel tries
      to allocate the key unsuccessfully, it will issue
      the above warning.
      
      Similar to hashtab, ensure the key size is at most
      MAX_BPF_STACK for a successful map creation.
      
      Fixes: 81110384 ("bpf: sockmap, add hash map support")
      Reported-by: syzbot+e4566d29080e7f3460ff@syzkaller.appspotmail.com
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      683d2ac3
  5. 16 5月, 2018 1 次提交
  6. 15 5月, 2018 2 次提交
    • J
      bpf: sockmap, refactor sockmap routines to work with hashmap · e5cd3abc
      John Fastabend 提交于
      This patch only refactors the existing sockmap code. This will allow
      much of the psock initialization code path and bpf helper codes to
      work for both sockmap bpf map types that are backed by an array, the
      currently supported type, and the new hash backed bpf map type
      sockhash.
      
      Most the fallout comes from three changes,
      
        - Pushing bpf programs into an independent structure so we
          can use it from the htab struct in the next patch.
        - Generalizing helpers to use void *key instead of the hardcoded
          u32.
        - Instead of passing map/key through the metadata we now do
          the lookup inline. This avoids storing the key in the metadata
          which will be useful when keys can be longer than 4 bytes. We
          rename the sk pointers to sk_redir at this point as well to
          avoid any confusion between the current sk pointer and the
          redirect pointer sk_redir.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      e5cd3abc
    • S
      bpf: enable stackmap with build_id in nmi context · bae77c5e
      Song Liu 提交于
      Currently, we cannot parse build_id in nmi context because of
      up_read(&current->mm->mmap_sem), this makes stackmap with build_id
      less useful. This patch enables parsing build_id in nmi by putting
      the up_read() call in irq_work. To avoid memory allocation in nmi
      context, we use per cpu variable for the irq_work. As a result, only
      one irq_work per cpu is allowed. If the irq_work is in-use, we
      fallback to only report ips.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bae77c5e
  7. 11 5月, 2018 2 次提交
  8. 10 5月, 2018 1 次提交
  9. 09 5月, 2018 5 次提交
  10. 05 5月, 2018 6 次提交
  11. 04 5月, 2018 7 次提交
    • J
      bpf: add faked "ending" subprog · 4cb3d99c
      Jiong Wang 提交于
      There are quite a few code snippet like the following in verifier:
      
             subprog_start = 0;
             if (env->subprog_cnt == cur_subprog + 1)
                     subprog_end = insn_cnt;
             else
                     subprog_end = env->subprog_info[cur_subprog + 1].start;
      
      The reason is there is no marker in subprog_info array to tell the end of
      it.
      
      We could resolve this issue by introducing a faked "ending" subprog.
      The special "ending" subprog is with "insn_cnt" as start offset, so it is
      serving as the end mark whenever we iterate over all subprogs.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4cb3d99c
    • J
      bpf: centre subprog information fields · 9c8105bd
      Jiong Wang 提交于
      It is better to centre all subprog information fields into one structure.
      This structure could later serve as function node in call graph.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9c8105bd
    • J
      bpf: unify main prog and subprog · f910cefa
      Jiong Wang 提交于
      Currently, verifier treat main prog and subprog differently. All subprogs
      detected are kept in env->subprog_starts while main prog is not kept there.
      Instead, main prog is implicitly defined as the prog start at 0.
      
      There is actually no difference between main prog and subprog, it is better
      to unify them, and register all progs detected into env->subprog_starts.
      
      This could also help simplifying some code logic.
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f910cefa
    • D
      bpf: use array_index_nospec in find_prog_type · d0f1a451
      Daniel Borkmann 提交于
      Commit 9ef09e35 ("bpf: fix possible spectre-v1 in find_and_alloc_map()")
      converted find_and_alloc_map() over to use array_index_nospec() to sanitize
      map type that user space passes on map creation, and this patch does an
      analogous conversion for progs in find_prog_type() as it's also passed from
      user space when loading progs as attr->prog_type.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d0f1a451
    • D
      bpf: implement ld_abs/ld_ind in native bpf · e0cea7ce
      Daniel Borkmann 提交于
      The main part of this work is to finally allow removal of LD_ABS
      and LD_IND from the BPF core by reimplementing them through native
      eBPF instead. Both LD_ABS/LD_IND were carried over from cBPF and
      keeping them around in native eBPF caused way more trouble than
      actually worth it. To just list some of the security issues in
      the past:
      
        * fdfaf64e ("x86: bpf_jit: support negative offsets")
        * 35607b02 ("sparc: bpf_jit: fix loads from negative offsets")
        * e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
        * 07aee943 ("bpf, sparc: fix usage of wrong reg for load_skb_regs after call")
        * 6d59b7db ("bpf, s390x: do not reload skb pointers in non-skb context")
        * 87338c8e ("bpf, ppc64: do not reload skb pointers in non-skb context")
      
      For programs in native eBPF, LD_ABS/LD_IND are pretty much legacy
      these days due to their limitations and more efficient/flexible
      alternatives that have been developed over time such as direct
      packet access. LD_ABS/LD_IND only cover 1/2/4 byte loads into a
      register, the load happens in host endianness and its exception
      handling can yield unexpected behavior. The latter is explained
      in depth in f6b1b3bf ("bpf: fix subprog verifier bypass by
      div/mod by 0 exception") with similar cases of exceptions we had.
      In native eBPF more recent program types will disable LD_ABS/LD_IND
      altogether through may_access_skb() in verifier, and given the
      limitations in terms of exception handling, it's also disabled
      in programs that use BPF to BPF calls.
      
      In terms of cBPF, the LD_ABS/LD_IND is used in networking programs
      to access packet data. It is not used in seccomp-BPF but programs
      that use it for socket filtering or reuseport for demuxing with
      cBPF. This is mostly relevant for applications that have not yet
      migrated to native eBPF.
      
      The main complexity and source of bugs in LD_ABS/LD_IND is coming
      from their implementation in the various JITs. Most of them keep
      the model around from cBPF times by implementing a fastpath written
      in asm. They use typically two from the BPF program hidden CPU
      registers for caching the skb's headlen (skb->len - skb->data_len)
      and skb->data. Throughout the JIT phase this requires to keep track
      whether LD_ABS/LD_IND are used and if so, the two registers need
      to be recached each time a BPF helper would change the underlying
      packet data in native eBPF case. At least in eBPF case, available
      CPU registers are rare and the additional exit path out of the
      asm written JIT helper makes it also inflexible since not all
      parts of the JITer are in control from plain C. A LD_ABS/LD_IND
      implementation in eBPF therefore allows to significantly reduce
      the complexity in JITs with comparable performance results for
      them, e.g.:
      
      test_bpf             tcpdump port 22             tcpdump complex
      x64      - before    15 21 10                    14 19  18
               - after      7 10 10                     7 10  15
      arm64    - before    40 91 92                    40 91 151
               - after     51 64 73                    51 62 113
      
      For cBPF we now track any usage of LD_ABS/LD_IND in bpf_convert_filter()
      and cache the skb's headlen and data in the cBPF prologue. The
      BPF_REG_TMP gets remapped from R8 to R2 since it's mainly just
      used as a local temporary variable. This allows to shrink the
      image on x86_64 also for seccomp programs slightly since mapping
      to %rsi is not an ereg. In callee-saved R8 and R9 we now track
      skb data and headlen, respectively. For normal prologue emission
      in the JITs this does not add any extra instructions since R8, R9
      are pushed to stack in any case from eBPF side. cBPF uses the
      convert_bpf_ld_abs() emitter which probes the fast path inline
      already and falls back to bpf_skb_load_helper_{8,16,32}() helper
      relying on the cached skb data and headlen as well. R8 and R9
      never need to be reloaded due to bpf_helper_changes_pkt_data()
      since all skb access in cBPF is read-only. Then, for the case
      of native eBPF, we use the bpf_gen_ld_abs() emitter, which calls
      the bpf_skb_load_helper_{8,16,32}_no_cache() helper unconditionally,
      does neither cache skb data and headlen nor has an inlined fast
      path. The reason for the latter is that native eBPF does not have
      any extra registers available anyway, but even if there were, it
      avoids any reload of skb data and headlen in the first place.
      Additionally, for the negative offsets, we provide an alternative
      bpf_skb_load_bytes_relative() helper in eBPF which operates
      similarly as bpf_skb_load_bytes() and allows for more flexibility.
      Tested myself on x64, arm64, s390x, from Sandipan on ppc64.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e0cea7ce
    • M
      bpf: fix possible spectre-v1 in find_and_alloc_map() · 9ef09e35
      Mark Rutland 提交于
      It's possible for userspace to control attr->map_type. Sanitize it when
      using it as an array index to prevent an out-of-bounds value being used
      under speculation.
      
      Found by smatch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: netdev@vger.kernel.org
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9ef09e35
    • B
      bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP · fbfc504a
      Björn Töpel 提交于
      The xskmap is yet another BPF map, very much inspired by
      dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application
      adds AF_XDP sockets into the map, and by using the bpf_redirect_map
      helper, an XDP program can redirect XDP frames to an AF_XDP socket.
      
      Note that a socket that is bound to certain ifindex/queue index will
      *only* accept XDP frames from that netdev/queue index. If an XDP
      program tries to redirect from a netdev/queue index other than what
      the socket is bound to, the frame will not be received on the socket.
      
      A socket can reside in multiple maps.
      
      v3: Fixed race and simplified code.
      v2: Removed one indirection in map lookup.
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fbfc504a
  12. 03 5月, 2018 5 次提交
    • Z
      tracing: Fix the file mode of stack tracer · 0c5a9acc
      Zhengyuan Liu 提交于
      It looks weird that the stack_trace_filter file can be written by root
      but shows that it does not have write permission by ll command.
      
      Link: http://lkml.kernel.org/r/1518054113-28096-1-git-send-email-liuzhengyuan@kylinos.cnSigned-off-by: NZhengyuan Liu <liuzhengyuan@kylinos.cn>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0c5a9acc
    • C
      ftrace: Have set_graph_* files have normal file modes · 1ce0500d
      Chen LinX 提交于
      The set_graph_function and set_graph_notrace file mode should be 0644
      instead of 0444 as they are writeable. Note, the mode appears to be ignored
      regardless, but they should at least look sane.
      
      Link: http://lkml.kernel.org/r/1409725869-4501-1-git-send-email-linx.z.chen@intel.comAcked-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NChen LinX <linx.z.chen@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      1ce0500d
    • J
      bpf: sockmap, fix error handling in redirect failures · abaeb096
      John Fastabend 提交于
      When a redirect failure happens we release the buffers in-flight
      without calling a sk_mem_uncharge(), the uncharge is called before
      dropping the sock lock for the redirecte, however we missed updating
      the ring start index. When no apply actions are in progress this
      is OK because we uncharge the entire buffer before the redirect.
      But, when we have apply logic running its possible that only a
      portion of the buffer is being redirected. In this case we only
      do memory accounting for the buffer slice being redirected and
      expect to be able to loop over the BPF program again and/or if
      a sock is closed uncharge the memory at sock destruct time.
      
      With an invalid start index however the program logic looks at
      the start pointer index, checks the length, and when seeing the
      length is zero (from the initial release and failure to update
      the pointer) aborts without uncharging/releasing the remaining
      memory.
      
      The fix for this is simply to update the start index. To avoid
      fixing this error in two locations we do a small refactor and
      remove one case where it is open-coded. Then fix it in the
      single function.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      abaeb096
    • J
      bpf: sockmap, zero sg_size on error when buffer is released · fec51d40
      John Fastabend 提交于
      When an error occurs during a redirect we have two cases that need
      to be handled (i) we have a cork'ed buffer (ii) we have a normal
      sendmsg buffer.
      
      In the cork'ed buffer case we don't currently support recovering from
      errors in a redirect action. So the buffer is released and the error
      should _not_ be pushed back to the caller of sendmsg/sendpage. The
      rationale here is the user will get an error that relates to old
      data that may have been sent by some arbitrary thread on that sock.
      Instead we simple consume the data and tell the user that the data
      has been consumed. We may add proper error recovery in the future.
      However, this patch fixes a bug where the bytes outstanding counter
      sg_size was not zeroed. This could result in a case where if the user
      has both a cork'ed action and apply action in progress we may
      incorrectly call into the BPF program when the user expected an
      old verdict to be applied via the apply action. I don't have a use
      case where using apply and cork at the same time is valid but we
      never explicitly reject it because it should work fine. This patch
      ensures the sg_size is zeroed so we don't have this case.
      
      In the normal sendmsg buffer case (no cork data) we also do not
      zero sg_size. Again this can confuse the apply logic when the logic
      calls into the BPF program when the BPF programmer expected the old
      verdict to remain. So ensure we set sg_size to zero here as well. And
      additionally to keep the psock state in-sync with the sk_msg_buff
      release all the memory as well. Previously we did this before
      returning to the user but this left a gap where psock and sk_msg_buff
      states were out of sync which seems fragile. No additional overhead
      is taken here except for a call to check the length and realize its
      already been freed. This is in the error path as well so in my
      opinion lets have robust code over optimized error paths.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fec51d40
    • J
      bpf: sockmap, fix scatterlist update on error path in send with apply · 3cc9a472
      John Fastabend 提交于
      When the call to do_tcp_sendpage() fails to send the complete block
      requested we either retry if only a partial send was completed or
      abort if we receive a error less than or equal to zero. Before
      returning though we must update the scatterlist length/offset to
      account for any partial send completed.
      
      Before this patch we did this at the end of the retry loop, but
      this was buggy when used while applying a verdict to fewer bytes
      than in the scatterlist. When the scatterlist length was being set
      we forgot to account for the apply logic reducing the size variable.
      So the result was we chopped off some bytes in the scatterlist without
      doing proper cleanup on them. This results in a WARNING when the
      sock is tore down because the bytes have previously been charged to
      the socket but are never uncharged.
      
      The simple fix is to simply do the accounting inside the retry loop
      subtracting from the absolute scatterlist values rather than trying
      to accumulate the totals and subtract at the end.
      Reported-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3cc9a472