1. 18 3月, 2018 2 次提交
  2. 17 3月, 2018 2 次提交
  3. 04 3月, 2018 1 次提交
  4. 26 2月, 2018 7 次提交
    • S
      batman-adv: Fix internal interface indices types · f22e0893
      Sven Eckelmann 提交于
      batman-adv uses internal indices for each enabled and active interface.
      It is currently used by the B.A.T.M.A.N. IV algorithm to identifify the
      correct position in the ogm_cnt bitmaps.
      
      The type for the number of enabled interfaces (which defines the next
      interface index) was set to char. This type can be (depending on the
      architecture) either signed (limiting batman-adv to 127 active slave
      interfaces) or unsigned (limiting batman-adv to 255 active slave
      interfaces).
      
      This limit was not correctly checked when an interface was enabled and thus
      an overflow happened. This was only catched on systems with the signed char
      type when the B.A.T.M.A.N. IV code tried to resize its counter arrays with
      a negative size.
      
      The if_num interface index was only a s16 and therefore significantly
      smaller than the ifindex (int) used by the code net code.
      
      Both &batadv_hard_iface->if_num and &batadv_priv->num_ifaces must be
      (unsigned) int to support the same number of slave interfaces as the net
      core code. And the interface activation code must check the number of
      active slave interfaces to avoid integer overflows.
      
      Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol")
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      f22e0893
    • S
      batman-adv: Fix netlink dumping of BLA backbones · fce672db
      Sven Eckelmann 提交于
      The function batadv_bla_backbone_dump_bucket must be able to handle
      non-complete dumps of a single bucket. It tries to do that by saving the
      latest dumped index in *idx_skip to inform the caller about the current
      state.
      
      But the caller only assumes that buckets were not completely dumped when
      the return code is non-zero. This function must therefore also return a
      non-zero index when the dumping of an entry failed. Otherwise the caller
      will just skip all remaining buckets.
      
      And the function must also reset *idx_skip back to zero when it finished a
      bucket. Otherwise it will skip the same number of entries in the next
      bucket as the previous one had.
      
      Fixes: ea4152e1 ("batman-adv: add backbone table netlink support")
      Reported-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      fce672db
    • S
      batman-adv: Fix netlink dumping of BLA claims · b0264ecd
      Sven Eckelmann 提交于
      The function batadv_bla_claim_dump_bucket must be able to handle
      non-complete dumps of a single bucket. It tries to do that by saving the
      latest dumped index in *idx_skip to inform the caller about the current
      state.
      
      But the caller only assumes that buckets were not completely dumped when
      the return code is non-zero. This function must therefore also return a
      non-zero index when the dumping of an entry failed. Otherwise the caller
      will just skip all remaining buckets.
      
      And the function must also reset *idx_skip back to zero when it finished a
      bucket. Otherwise it will skip the same number of entries in the next
      bucket as the previous one had.
      
      Fixes: 04f3f5bf ("batman-adv: add B.A.T.M.A.N. Dump BLA claims via netlink")
      Reported-by: NLinus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      b0264ecd
    • S
      batman-adv: Ignore invalid batadv_v_gw during netlink send · 011c935f
      Sven Eckelmann 提交于
      The function batadv_v_gw_dump stops the processing loop when
      batadv_v_gw_dump_entry returns a non-0 return code. This should only
      happen when the buffer is full. Otherwise, an empty message may be
      returned by batadv_gw_dump. This empty message will then stop the netlink
      dumping of gateway entries. At worst, not a single entry is returned to
      userspace even when plenty of possible gateways exist.
      
      Fixes: b71bb6f9 ("batman-adv: add B.A.T.M.A.N. V bat_gw_dump implementations")
      Signed-off-by: NSven Eckelmann <sven.eckelmann@openmesh.com>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      011c935f
    • S
      batman-adv: Ignore invalid batadv_iv_gw during netlink send · 10d57028
      Sven Eckelmann 提交于
      The function batadv_iv_gw_dump stops the processing loop when
      batadv_iv_gw_dump_entry returns a non-0 return code. This should only
      happen when the buffer is full. Otherwise, an empty message may be
      returned by batadv_gw_dump. This empty message will then stop the netlink
      dumping of gateway entries. At worst, not a single entry is returned to
      userspace even when plenty of possible gateways exist.
      
      Fixes: efb766af ("batman-adv: add B.A.T.M.A.N. IV bat_gw_dump implementations")
      Signed-off-by: NSven Eckelmann <sven.eckelmann@openmesh.com>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      10d57028
    • M
      batman-adv: invalidate checksum on fragment reassembly · 3bf2a09d
      Matthias Schiffer 提交于
      A more sophisticated implementation could try to combine fragment checksums
      when all fragments have CHECKSUM_COMPLETE and are split at even offsets.
      For now, we just set ip_summed to CHECKSUM_NONE to avoid "hw csum failure"
      warnings in the kernel log when fragmented frames are received. In
      consequence, skb_pull_rcsum() can be replaced with skb_pull().
      
      Note that in usual setups, packets don't reach batman-adv with
      CHECKSUM_COMPLETE (I assume NICs bail out of checksumming when they see
      batadv's ethtype?), which is why the log messages do not occur on every
      system using batman-adv. I could reproduce this issue by stacking
      batman-adv on top of a VXLAN interface.
      
      Fixes: 610bfc6b ("batman-adv: Receive fragmented packets and merge")
      Tested-by: NMaximilian Wilhelm <max@sdn.clinic>
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      3bf2a09d
    • M
      batman-adv: fix packet checksum in receive path · abd63605
      Matthias Schiffer 提交于
      eth_type_trans() internally calls skb_pull(), which does not adjust the
      skb checksum; skb_postpull_rcsum() is necessary to avoid log spam of the
      form "bat0: hw csum failure" when packets with CHECKSUM_COMPLETE are
      received.
      
      Note that in usual setups, packets don't reach batman-adv with
      CHECKSUM_COMPLETE (I assume NICs bail out of checksumming when they see
      batadv's ethtype?), which is why the log messages do not occur on every
      system using batman-adv. I could reproduce this issue by stacking
      batman-adv on top of a VXLAN interface.
      
      Fixes: c6c8fea2 ("net: Add batman-adv meshing protocol")
      Tested-by: NMaximilian Wilhelm <max@sdn.clinic>
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NSven Eckelmann <sven@narfation.org>
      Signed-off-by: NSimon Wunderlich <sw@simonwunderlich.de>
      abd63605
  5. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  6. 10 2月, 2018 1 次提交
    • A
      sctp: verify size of a new chunk in _sctp_make_chunk() · 07f2c7ab
      Alexey Kodanev 提交于
      When SCTP makes INIT or INIT_ACK packet the total chunk length
      can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when
      transmitting these packets, e.g. the crash on sending INIT_ACK:
      
      [  597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168
                     put:120156 head:000000007aa47635 data:00000000d991c2de
                     tail:0x1d640 end:0xfec0 dev:<NULL>
      ...
      [  597.976970] ------------[ cut here ]------------
      [  598.033408] kernel BUG at net/core/skbuff.c:104!
      [  600.314841] Call Trace:
      [  600.345829]  <IRQ>
      [  600.371639]  ? sctp_packet_transmit+0x2095/0x26d0 [sctp]
      [  600.436934]  skb_put+0x16c/0x200
      [  600.477295]  sctp_packet_transmit+0x2095/0x26d0 [sctp]
      [  600.540630]  ? sctp_packet_config+0x890/0x890 [sctp]
      [  600.601781]  ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp]
      [  600.671356]  ? sctp_cmp_addr_exact+0x3f/0x90 [sctp]
      [  600.731482]  sctp_outq_flush+0x663/0x30d0 [sctp]
      [  600.788565]  ? sctp_make_init+0xbf0/0xbf0 [sctp]
      [  600.845555]  ? sctp_check_transmitted+0x18f0/0x18f0 [sctp]
      [  600.912945]  ? sctp_outq_tail+0x631/0x9d0 [sctp]
      [  600.969936]  sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp]
      [  601.041593]  ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp]
      [  601.104837]  ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp]
      [  601.175436]  ? sctp_eat_data+0x1710/0x1710 [sctp]
      [  601.233575]  sctp_do_sm+0x182/0x560 [sctp]
      [  601.284328]  ? sctp_has_association+0x70/0x70 [sctp]
      [  601.345586]  ? sctp_rcv+0xef4/0x32f0 [sctp]
      [  601.397478]  ? sctp6_rcv+0xa/0x20 [sctp]
      ...
      
      Here the chunk size for INIT_ACK packet becomes too big, mostly
      because of the state cookie (INIT packet has large size with
      many address parameters), plus additional server parameters.
      
      Later this chunk causes the panic in skb_put_data():
      
        skb_packet_transmit()
            sctp_packet_pack()
                skb_put_data(nskb, chunk->skb->data, chunk->skb->len);
      
      'nskb' (head skb) was previously allocated with packet->size
      from u16 'chunk->chunk_hdr->length'.
      
      As suggested by Marcelo we should check the chunk's length in
      _sctp_make_chunk() before trying to allocate skb for it and
      discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN.
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leinter@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07f2c7ab
  7. 09 2月, 2018 11 次提交
    • T
      SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context · 0afa6b44
      Trond Myklebust 提交于
      Calling __UDPX_INC_STATS() from a preemptible context leads to a
      warning of the form:
      
       BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31
       caller is xs_udp_data_receive_workfn+0x194/0x270
       CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1b #2
       Workqueue: xprtiod xs_udp_data_receive_workfn
       Call Trace:
        dump_stack+0x85/0xc1
        check_preemption_disabled+0xce/0xe0
        xs_udp_data_receive_workfn+0x194/0x270
        process_one_work+0x318/0x620
        worker_thread+0x20a/0x390
        ? process_one_work+0x620/0x620
        kthread+0x120/0x130
        ? __kthread_bind_mask+0x60/0x60
        ret_from_fork+0x24/0x30
      
      Since we're taking a spinlock in those functions anyway, let's fix the
      issue by moving the call so that it occurs under the spinlock.
      Reported-by: Nkernel test robot <fengguang.wu@intel.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0afa6b44
    • O
      fix parallelism for rpc tasks · f515f86b
      Olga Kornievskaia 提交于
      Hi folks,
      
      On a multi-core machine, is it expected that we can have parallel RPCs
      handled by each of the per-core workqueue?
      
      In testing a read workload, observing via "top" command that a single
      "kworker" thread is running servicing the requests (no parallelism).
      It's more prominent while doing these operations over krb5p mount.
      
      What has been suggested by Bruce is to try this and in my testing I
      see then the read workload spread among all the kworker threads.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f515f86b
    • H
      tipc: fix skb truesize/datasize ratio control · 55b3280d
      Hoang Le 提交于
      In commit d618d09a ("tipc: enforce valid ratio between skb truesize
      and contents") we introduced a test for ensuring that the condition
      truesize/datasize <= 4 is true for a received buffer. Unfortunately this
      test has two problems.
      
      - Because of the integer arithmetics the test
        if (skb->truesize / buf_roundup_len(skb) > 4) will miss all
        ratios [4 < ratio < 5], which was not the intention.
      - The buffer returned by skb_copy() inherits skb->truesize of the
        original buffer, which doesn't help the situation at all.
      
      In this commit, we change the ratio condition and replace skb_copy()
      with a call to skb_copy_expand() to finally get this right.
      Acked-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55b3280d
    • I
      net/sched: cls_u32: fix cls_u32 on filter replace · eb53f7af
      Ivan Vecera 提交于
      The following sequence is currently broken:
      
       # tc qdisc add dev foo ingress
       # tc filter replace dev foo protocol all ingress \
         u32 match u8 0 0 action mirred egress mirror dev bar1
       # tc filter replace dev foo protocol all ingress \
         handle 800::800 pref 49152 \
         u32 match u8 0 0 action mirred egress mirror dev bar2
       Error: cls_u32: Key node flags do not match passed flags.
       We have an error talking to the kernel, -1
      
      The error comes from u32_change() when comparing new and
      existing flags. The existing ones always contains one of
      TCA_CLS_FLAGS_{,NOT}_IN_HW flag depending on offloading state.
      These flags cannot be passed from userspace so the condition
      (n->flags != flags) in u32_change() always fails.
      
      Fix the condition so the flags TCA_CLS_FLAGS_NOT_IN_HW and
      TCA_CLS_FLAGS_IN_HW are not taken into account.
      
      Fixes: 24d3dc6d ("net/sched: cls_u32: Reflect HW offload status")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb53f7af
    • D
      mpls, nospec: Sanitize array index in mpls_label_ok() · 3968523f
      Dan Williams 提交于
      mpls_label_ok() validates that the 'platform_label' array index from a
      userspace netlink message payload is valid. Under speculation the
      mpls_label_ok() result may not resolve in the CPU pipeline until after
      the index is used to access an array element. Sanitize the index to zero
      to prevent userspace-controlled arbitrary out-of-bounds speculation, a
      precursor for a speculative execution side channel vulnerability.
      
      Cc: <stable@vger.kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3968523f
    • S
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan 提交于
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1 ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
    • K
      net: Whitelist the skbuff_head_cache "cb" field · 79a8a642
      Kees Cook 提交于
      Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
      Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
      result of the cmsg header calculations. This means that hardened usercopy
      will examine the copy, even though it was technically a fixed size and
      should be implicitly whitelisted. All the put_cmsg() calls being built
      from values in skbuff_head_cache are coming out of the protocol-defined
      "cb" field, so whitelist this field entirely instead of creating per-use
      bounce buffers, for which there are concerns about performance.
      
      Original report was:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)!
      WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76
      ...
       __check_heap_object+0x89/0xc0 mm/slab.c:4426
       check_heap_object mm/usercopy.c:236 [inline]
       __check_object_size+0x272/0x530 mm/usercopy.c:259
       check_object_size include/linux/thread_info.h:112 [inline]
       check_copy_size include/linux/thread_info.h:143 [inline]
       copy_to_user include/linux/uaccess.h:154 [inline]
       put_cmsg+0x233/0x3f0 net/core/scm.c:242
       sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
       packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:810
       ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
       __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
       SYSC_recvmmsg net/socket.c:2368 [inline]
       SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
       entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Reported-by: syzbot+e2d6cfb305e9f3911dea@syzkaller.appspotmail.com
      Fixes: 6d07d1cd ("usercopy: Restrict non-usercopy caches to size 0")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79a8a642
    • C
      rtnetlink: require unique netns identifier · 4ff66cae
      Christian Brauner 提交于
      Since we've added support for IFLA_IF_NETNSID for RTM_{DEL,GET,SET,NEW}LINK
      it is possible for userspace to send us requests with three different
      properties to identify a target network namespace. This affects at least
      RTM_{NEW,SET}LINK. Each of them could potentially refer to a different
      network namespace which is confusing. For legacy reasons the kernel will
      pick the IFLA_NET_NS_PID property first and then look for the
      IFLA_NET_NS_FD property but there is no reason to extend this type of
      behavior to network namespace ids. The regression potential is quite
      minimal since the rtnetlink requests in question either won't allow
      IFLA_IF_NETNSID requests before 4.16 is out (RTM_{NEW,SET}LINK) or don't
      support IFLA_NET_NS_{PID,FD} (RTM_{DEL,GET}LINK) in the first place.
      Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ff66cae
    • N
      netlink: ensure to loop over all netns in genlmsg_multicast_allns() · cb9f7a9a
      Nicolas Dichtel 提交于
      Nowadays, nlmsg_multicast() returns only 0 or -ESRCH but this was not the
      case when commit 134e6375 was pushed.
      However, there was no reason to stop the loop if a netns does not have
      listeners.
      Returns -ESRCH only if there was no listeners in all netns.
      
      To avoid having the same problem in the future, I didn't take the
      assumption that nlmsg_multicast() returns only 0 or -ESRCH.
      
      Fixes: 134e6375 ("genetlink: make netns aware")
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9f7a9a
    • D
      rxrpc: Don't put crypto buffers on the stack · 8c2f826d
      David Howells 提交于
      Don't put buffers of data to be handed to crypto on the stack as this may
      cause an assertion failure in the kernel (see below).  Fix this by using an
      kmalloc'd buffer instead.
      
      kernel BUG at ./include/linux/scatterlist.h:147!
      ...
      RIP: 0010:rxkad_encrypt_response.isra.6+0x191/0x1b0 [rxrpc]
      RSP: 0018:ffffbe2fc06cfca8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff989277d59900 RCX: 0000000000000028
      RDX: 0000259dc06cfd88 RSI: 0000000000000025 RDI: ffffbe30406cfd88
      RBP: ffffbe2fc06cfd60 R08: ffffbe2fc06cfd08 R09: ffffbe2fc06cfd08
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff7c5f80d9f95
      R13: ffffbe2fc06cfd88 R14: ffff98927a3f7aa0 R15: ffffbe2fc06cfd08
      FS:  0000000000000000(0000) GS:ffff98927fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055b1ff28f0f8 CR3: 000000001b412003 CR4: 00000000003606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rxkad_respond_to_challenge+0x297/0x330 [rxrpc]
       rxrpc_process_connection+0xd1/0x690 [rxrpc]
       ? process_one_work+0x1c3/0x680
       ? __lock_is_held+0x59/0xa0
       process_one_work+0x249/0x680
       worker_thread+0x3a/0x390
       ? process_one_work+0x680/0x680
       kthread+0x121/0x140
       ? kthread_create_worker_on_cpu+0x70/0x70
       ret_from_fork+0x3a/0x50
      Reported-by: NJonathan Billings <jsbillings@jsbillings.org>
      Reported-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NJonathan Billings <jsbillings@jsbillings.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c2f826d
    • C
      svcrdma: Fix Read chunk round-up · 175e0310
      Chuck Lever 提交于
      A single NFSv4 WRITE compound can often have three operations:
      PUTFH, WRITE, then GETATTR.
      
      When the WRITE payload is sent in a Read chunk, the client places
      the GETATTR in the inline part of the RPC/RDMA message, just after
      the WRITE operation (sans payload). The position value in the Read
      chunk enables the receiver to insert the Read chunk at the correct
      place in the received XDR stream; that is between the WRITE and
      GETATTR.
      
      According to RFC 8166, an NFS/RDMA client does not have to add XDR
      round-up to the Read chunk that carries the WRITE payload. The
      receiver adds XDR round-up padding if it is absent and the
      receiver's XDR decoder requires it to be present.
      
      Commit 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      attempted to add support for receiving such a compound so that just
      the WRITE payload appears in rq_arg's page list, and the trailing
      GETATTR is placed in rq_arg's tail iovec. (TCP just strings the
      whole compound into the head iovec and page list, without regard
      to the alignment of the WRITE payload).
      
      The server transport logic also had to accommodate the optional XDR
      round-up of the Read chunk, which it did simply by lengthening the
      tail iovec when round-up was needed. This approach is adequate for
      the NFSv2 and NFSv3 WRITE decoders.
      
      Unfortunately it is not sufficient for nfsd4_decode_write. When the
      Read chunk length is a couple of bytes less than PAGE_SIZE, the
      computation at the end of nfsd4_decode_write allows argp->pagelen to
      go negative, which breaks the logic in read_buf that looks for the
      tail iovec.
      
      The result is that a WRITE operation whose payload length is just
      less than a multiple of a page succeeds, but the subsequent GETATTR
      in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR
      decoder can't find it. Clients ignore the error, but they must
      update their attribute cache via a separate round trip.
      
      As nfsd4_decode_write appears to expect the payload itself to always
      have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk
      add the Read chunk XDR round-up to the page_len rather than
      lengthening the tail iovec.
      Reported-by: NOlga Kornievskaia <kolga@netapp.com>
      Fixes: 193bcb7b ("svcrdma: Populate tail iovec when receiving")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      175e0310
  8. 08 2月, 2018 6 次提交
    • S
      tcp: tracepoint: only call trace_tcp_send_reset with full socket · 5c487bb9
      Song Liu 提交于
      tracepoint tcp_send_reset requires a full socket to work. However, it
      may be called when in TCP_TIME_WAIT:
      
              case TCP_TW_RST:
                      tcp_v6_send_reset(sk, skb);
                      inet_twsk_deschedule_put(inet_twsk(sk));
                      goto discard_it;
      
      To avoid this problem, this patch checks the socket with sk_fullsock()
      before calling trace_tcp_send_reset().
      
      Fixes: c24b14c4 ("tcp: add tracepoint trace_tcp_send_reset")
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Reviewed-by: NLawrence Brakmo <brakmo@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c487bb9
    • M
      sch_netem: Bug fixing in calculating Netem interval · 043e337f
      Md. Islam 提交于
      In Kernel 4.15.0+, Netem does not work properly.
      
      Netem setup:
      
      tc qdisc add dev h1-eth0 root handle 1: netem delay 10ms 2ms
      
      Result:
      
      PING 172.16.101.2 (172.16.101.2) 56(84) bytes of data.
      64 bytes from 172.16.101.2: icmp_seq=1 ttl=64 time=22.8 ms
      64 bytes from 172.16.101.2: icmp_seq=2 ttl=64 time=10.9 ms
      64 bytes from 172.16.101.2: icmp_seq=3 ttl=64 time=10.9 ms
      64 bytes from 172.16.101.2: icmp_seq=5 ttl=64 time=11.4 ms
      64 bytes from 172.16.101.2: icmp_seq=6 ttl=64 time=11.8 ms
      64 bytes from 172.16.101.2: icmp_seq=4 ttl=64 time=4303 ms
      64 bytes from 172.16.101.2: icmp_seq=10 ttl=64 time=11.2 ms
      64 bytes from 172.16.101.2: icmp_seq=11 ttl=64 time=10.3 ms
      64 bytes from 172.16.101.2: icmp_seq=7 ttl=64 time=4304 ms
      64 bytes from 172.16.101.2: icmp_seq=8 ttl=64 time=4303 ms
      
      Patch:
      
      (rnd % (2 * sigma)) - sigma was overflowing s32. After applying the
      patch, I found following output which is desirable.
      
      PING 172.16.101.2 (172.16.101.2) 56(84) bytes of data.
      64 bytes from 172.16.101.2: icmp_seq=1 ttl=64 time=21.1 ms
      64 bytes from 172.16.101.2: icmp_seq=2 ttl=64 time=8.46 ms
      64 bytes from 172.16.101.2: icmp_seq=3 ttl=64 time=9.00 ms
      64 bytes from 172.16.101.2: icmp_seq=4 ttl=64 time=11.8 ms
      64 bytes from 172.16.101.2: icmp_seq=5 ttl=64 time=8.36 ms
      64 bytes from 172.16.101.2: icmp_seq=6 ttl=64 time=11.8 ms
      64 bytes from 172.16.101.2: icmp_seq=7 ttl=64 time=8.11 ms
      64 bytes from 172.16.101.2: icmp_seq=8 ttl=64 time=10.0 ms
      64 bytes from 172.16.101.2: icmp_seq=9 ttl=64 time=11.3 ms
      64 bytes from 172.16.101.2: icmp_seq=10 ttl=64 time=11.5 ms
      64 bytes from 172.16.101.2: icmp_seq=11 ttl=64 time=10.2 ms
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      043e337f
    • D
      net/ipv6: onlink nexthop checks should default to main table · 44750f84
      David Ahern 提交于
      Because of differences in how ipv4 and ipv6 handle fib lookups,
      verification of nexthops with onlink flag need to default to the main
      table rather than the local table used by IPv4. As it stands an
      address within a connected route on device 1 can be used with
      onlink on device 2. Updating the table properly rejects the route
      due to the egress device mismatch.
      
      Update the extack message as well to show it could be a device
      mismatch for the nexthop spec.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44750f84
    • D
      net/ipv6: Handle reject routes with onlink flag · 58e354c0
      David Ahern 提交于
      Verification of nexthops with onlink flag need to handle unreachable
      routes. The lookup is only intended to validate the gateway address
      is not a local address and if the gateway resolves the egress device
      must match the given device. Hence, hitting any default reject route
      is ok.
      
      Fixes: fc1e64e1 ("net/ipv6: Add support for onlink flag")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58e354c0
    • D
      rxrpc: Fix received abort handling · 17e9e23b
      David Howells 提交于
      AF_RXRPC is incorrectly sending back to the server any abort it receives
      for a client connection.  This is due to the final-ACK offload to the
      connection event processor patch.  The abort code is copied into the
      last-call information on the connection channel and then the event
      processor is set.
      
      Instead, the following should be done:
      
       (1) In the case of a final-ACK for a successful call, the ACK should be
           scheduled as before.
      
       (2) In the case of a locally generated ABORT, the ABORT details should be
           cached for sending in response to further packets related to that
           call and no further action scheduled at call disconnect time.
      
       (3) In the case of an ACK received from the peer, the call should be
           considered dead, no ABORT should be transmitted at this time.  In
           response to further non-ABORT packets from the peer relating to this
           call, an RX_USER_ABORT ABORT should be transmitted.
      
       (4) In the case of a call killed due to network error, an RX_USER_ABORT
           ABORT should be cached for transmission in response to further
           packets, but no ABORT should be sent at this time.
      
      Fixes: 3136ef49 ("rxrpc: Delay terminal ACK transmission on a client call")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17e9e23b
    • T
      Make the xprtiod workqueue unbounded. · 90ea9f1b
      Trond Myklebust 提交于
      This should help reduce the latency on replies.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      90ea9f1b
  9. 07 2月, 2018 9 次提交