1. 02 10月, 2018 3 次提交
    • L
      tipc: ignore STATE_MSG on wrong link session · d949cfed
      LUU Duc Canh 提交于
      The initial session number when a link is created is based on a random
      value, taken from struct tipc_net->random. It is then incremented for
      each link reset to avoid mixing protocol messages from different link
      sessions.
      
      However, when a bearer is reset all its links are deleted, and will
      later be re-created using the same random value as the first time.
      This means that if the link never went down between creation and
      deletion we will still sometimes have two subsequent sessions with
      the same session number. In virtual environments with potentially
      long transmission times this has turned out to be a real problem.
      
      We now fix this by randomizing the session number each time a link
      is created.
      
      With a session number size of 16 bits this gives a risk of session
      collision of 1/64k. To reduce this further, we also introduce a sanity
      check on the very first STATE message arriving at a link. If this has
      an acknowledge value differing from 0, which is logically impossible,
      we ignore the message. The final risk for session collision is hence
      reduced to 1/4G, which should be sufficient.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d949cfed
    • D
      net: sched: act_ipt: check for underflow in __tcf_ipt_init() · aeadd93f
      Dan Carpenter 提交于
      If "td->u.target_size" is larger than sizeof(struct xt_entry_target) we
      return -EINVAL.  But we don't check whether it's smaller than
      sizeof(struct xt_entry_target) and that could lead to an out of bounds
      read.
      
      Fixes: 7ba699c6 ("[NET_SCHED]: Convert actions from rtnetlink to new netlink API")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeadd93f
    • E
      tcp/dccp: fix lockdep issue when SYN is backlogged · 1ad98e9d
      Eric Dumazet 提交于
      In normal SYN processing, packets are handled without listener
      lock and in RCU protected ingress path.
      
      But syzkaller is known to be able to trick us and SYN
      packets might be processed in process context, after being
      queued into socket backlog.
      
      In commit 06f877d6 ("tcp/dccp: fix other lockdep splats
      accessing ireq_opt") I made a very stupid fix, that happened
      to work mostly because of the regular path being RCU protected.
      
      Really the thing protecting ireq->ireq_opt is RCU read lock,
      and the pseudo request refcnt is not relevant.
      
      This patch extends what I did in commit 449809a6 ("tcp/dccp:
      block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
      pair in the paths that might be taken when processing SYN from
      socket backlog (thus possibly in process context)
      
      Fixes: 06f877d6 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ad98e9d
  2. 30 9月, 2018 1 次提交
    • L
      tipc: fix failover problem · c140eb16
      LUU Duc Canh 提交于
      We see the following scenario:
      1) Link endpoint B on node 1 discovers that its peer endpoint is gone.
         Since there is a second working link, failover procedure is started.
      2) Link endpoint A on node 1 sends a FAILOVER message to peer endpoint
         A on node 2. The node item 1->2 goes to state FAILINGOVER.
      3) Linke endpoint A/2 receives the failover, and is supposed to take
         down its parallell link endpoint B/2, while producing a FAILOVER
         message to send back to A/1.
      4) However, B/2 has already been deleted, so no FAILOVER message can
         created.
      5) Node 1->2 remains in state FAILINGOVER forever, refusing to receive
         any messages that can bring B/1 up again. We are left with a non-
         redundant link between node 1 and 2.
      
      We fix this with letting endpoint A/2 build a dummy FAILOVER message
      to send to back to A/1, so that the situation can be resolved.
      Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c140eb16
  3. 29 9月, 2018 1 次提交
  4. 28 9月, 2018 9 次提交
    • F
      netfilter: xt_socket: check sk before checking for netns. · 40e4f26e
      Flavio Leitner 提交于
      Only check for the network namespace if the socket is available.
      
      Fixes: f5646501 ("netfilter: check if the socket netns is correct.")
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      40e4f26e
    • T
      netfilter: nft_set_rbtree: add missing rb_erase() in GC routine · a13f814a
      Taehee Yoo 提交于
      The nft_set_gc_batch_check() checks whether gc buffer is full.
      If gc buffer is full, gc buffer is released by
      the nft_set_gc_batch_complete() internally.
      In case of rbtree, the rb_erase() should be called before calling the
      nft_set_gc_batch_complete(). therefore the rb_erase() should
      be called before calling the nft_set_gc_batch_check() too.
      
      test commands:
         table ip filter {
      	   set set1 {
      		   type ipv4_addr; flags interval, timeout;
      		   gc-interval 10s;
      		   timeout 1s;
      		   elements = {
      			   1-2,
      			   3-4,
      			   5-6,
      			   ...
      			   10000-10001,
      		   }
      	   }
         }
         %nft -f test.nft
      
      splat looks like:
      [  430.273885] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [  430.282158] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  430.283116] CPU: 1 PID: 190 Comm: kworker/1:2 Tainted: G    B             4.18.0+ #7
      [  430.283116] Workqueue: events_power_efficient nft_rbtree_gc [nf_tables_set]
      [  430.313559] RIP: 0010:rb_next+0x81/0x130
      [  430.313559] Code: 08 49 bd 00 00 00 00 00 fc ff df 48 bb 00 00 00 00 00 fc ff df 48 85 c0 75 05 eb 58 48 89 d4
      [  430.313559] RSP: 0018:ffff88010cdb7680 EFLAGS: 00010207
      [  430.313559] RAX: 0000000000b84854 RBX: dffffc0000000000 RCX: ffffffff83f01973
      [  430.313559] RDX: 000000000017090c RSI: 0000000000000008 RDI: 0000000000b84864
      [  430.313559] RBP: ffff8801060d4588 R08: fffffbfff09bc349 R09: fffffbfff09bc349
      [  430.313559] R10: 0000000000000001 R11: fffffbfff09bc348 R12: ffff880100f081a8
      [  430.313559] R13: dffffc0000000000 R14: ffff880100ff8688 R15: dffffc0000000000
      [  430.313559] FS:  0000000000000000(0000) GS:ffff88011b400000(0000) knlGS:0000000000000000
      [  430.313559] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  430.313559] CR2: 0000000001551008 CR3: 000000005dc16000 CR4: 00000000001006e0
      [  430.313559] Call Trace:
      [  430.313559]  nft_rbtree_gc+0x112/0x5c0 [nf_tables_set]
      [  430.313559]  process_one_work+0xc13/0x1ec0
      [  430.313559]  ? _raw_spin_unlock_irq+0x29/0x40
      [  430.313559]  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
      [  430.313559]  ? set_load_weight+0x270/0x270
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __switch_to_asm+0x40/0x70
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __switch_to_asm+0x40/0x70
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __switch_to_asm+0x40/0x70
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __switch_to_asm+0x40/0x70
      [  430.313559]  ? __switch_to_asm+0x34/0x70
      [  430.313559]  ? __schedule+0x6d3/0x1f50
      [  430.313559]  ? find_held_lock+0x39/0x1c0
      [  430.313559]  ? __sched_text_start+0x8/0x8
      [  430.313559]  ? cyc2ns_read_end+0x10/0x10
      [  430.313559]  ? save_trace+0x300/0x300
      [  430.313559]  ? sched_clock_local+0xd4/0x140
      [  430.313559]  ? find_held_lock+0x39/0x1c0
      [  430.313559]  ? worker_thread+0x353/0x1120
      [  430.313559]  ? worker_thread+0x353/0x1120
      [  430.313559]  ? lock_contended+0xe70/0xe70
      [  430.313559]  ? __lock_acquire+0x4500/0x4500
      [  430.535635]  ? do_raw_spin_unlock+0xa5/0x330
      [  430.535635]  ? do_raw_spin_trylock+0x101/0x1a0
      [  430.535635]  ? do_raw_spin_lock+0x1f0/0x1f0
      [  430.535635]  ? _raw_spin_lock_irq+0x10/0x70
      [  430.535635]  worker_thread+0x15d/0x1120
      [ ... ]
      
      Fixes: 8d8540c4 ("netfilter: nft_set_rbtree: add timeout support")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a13f814a
    • D
      rxrpc: Fix error distribution · f3344303
      David Howells 提交于
      Fix error distribution by immediately delivering the errors to all the
      affected calls rather than deferring them to a worker thread.  The problem
      with the latter is that retries and things can happen in the meantime when we
      want to stop that sooner.
      
      To this end:
      
       (1) Stop the error distributor from removing calls from the error_targets
           list so that peer->lock isn't needed to synchronise against other adds
           and removals.
      
       (2) Require the peer's error_targets list to be accessed with RCU, thereby
           avoiding the need to take peer->lock over distribution.
      
       (3) Don't attempt to affect a call's state if it is already marked complete.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      f3344303
    • D
      rxrpc: Fix transport sockopts to get IPv4 errors on an IPv6 socket · 37a675e7
      David Howells 提交于
      It seems that enabling IPV6_RECVERR on an IPv6 socket doesn't also turn on
      IP_RECVERR, so neither local errors nor ICMP-transported remote errors from
      IPv4 peer addresses are returned to the AF_RXRPC protocol.
      
      Make the sockopt setting code in rxrpc_open_socket() fall through from the
      AF_INET6 case to the AF_INET case to turn on all the AF_INET options too in
      the AF_INET6 case.
      
      Fixes: f2aeed3a ("rxrpc: Fix error reception on AF_INET6 sockets")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      37a675e7
    • D
      rxrpc: Make service call handling more robust · 0099dc58
      David Howells 提交于
      Make the following changes to improve the robustness of the code that sets
      up a new service call:
      
       (1) Cache the rxrpc_sock struct obtained in rxrpc_data_ready() to do a
           service ID check and pass that along to rxrpc_new_incoming_call().
           This means that I can remove the check from rxrpc_new_incoming_call()
           without the need to worry about the socket attached to the local
           endpoint getting replaced - which would invalidate the check.
      
       (2) Cache the rxrpc_peer struct, thereby allowing the peer search to be
           done once.  The peer is passed to rxrpc_new_incoming_call(), thereby
           saving the need to repeat the search.
      
           This also reduces the possibility of rxrpc_publish_service_conn()
           BUG()'ing due to the detection of a duplicate connection, despite the
           initial search done by rxrpc_find_connection_rcu() having turned up
           nothing.
      
           This BUG() shouldn't ever get hit since rxrpc_data_ready() *should* be
           non-reentrant and the result of the initial search should still hold
           true, but it has proven possible to hit.
      
           I *think* this may be due to __rxrpc_lookup_peer_rcu() cutting short
           the iteration over the hash table if it finds a matching peer with a
           zero usage count, but I don't know for sure since it's only ever been
           hit once that I know of.
      
           Another possibility is that a bug in rxrpc_data_ready() that checked
           the wrong byte in the header for the RXRPC_CLIENT_INITIATED flag
           might've let through a packet that caused a spurious and invalid call
           to be set up.  That is addressed in another patch.
      
       (3) Fix __rxrpc_lookup_peer_rcu() to skip peer records that have a zero
           usage count rather than stopping and returning not found, just in case
           there's another peer record behind it in the bucket.
      
       (4) Don't search the peer records in rxrpc_alloc_incoming_call(), but
           rather either use the peer cached in (2) or, if one wasn't found,
           preemptively install a new one.
      
      Fixes: 8496af50 ("rxrpc: Use RCU to access a peer's service connection tree")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      0099dc58
    • D
      rxrpc: Improve up-front incoming packet checking · 403fc2a1
      David Howells 提交于
      Do more up-front checking on incoming packets to weed out invalid ones and
      also ones aimed at services that we don't support.
      
      Whilst we're at it, replace the clearing of call and skew if we don't find
      a connection with just initialising the variables to zero at the top of the
      function.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      403fc2a1
    • D
      rxrpc: Emit BUSY packets when supposed to rather than ABORTs · ece64fec
      David Howells 提交于
      In the input path, a received sk_buff can be marked for rejection by
      setting RXRPC_SKB_MARK_* in skb->mark and, if needed, some auxiliary data
      (such as an abort code) in skb->priority.  The rejection is handled by
      queueing the sk_buff up for dealing with in process context.  The output
      code reads the mark and priority and, theoretically, generates an
      appropriate response packet.
      
      However, if RXRPC_SKB_MARK_BUSY is set, this isn't noticed and an ABORT
      message with a random abort code is generated (since skb->priority wasn't
      set to anything).
      
      Fix this by outputting the appropriate sort of packet.
      
      Also, whilst we're at it, most of the marks are no longer used, so remove
      them and rename the remaining two to something more obvious.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      ece64fec
    • D
      rxrpc: Fix RTT gathering · b604dd98
      David Howells 提交于
      Fix RTT information gathering in AF_RXRPC by the following means:
      
       (1) Enable Rx timestamping on the transport socket with SO_TIMESTAMPNS.
      
       (2) If the sk_buff doesn't have a timestamp set when rxrpc_data_ready()
           collects it, set it at that point.
      
       (3) Allow ACKs to be requested on the last packet of a client call, but
           not a service call.  We need to be careful lest we undo:
      
      	bf7d620a
      	Author: David Howells <dhowells@redhat.com>
      	Date:   Thu Oct 6 08:11:51 2016 +0100
      	rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase
      
           but that only really applies to service calls that we're handling,
           since the client side gets to send the final ACK (or not).
      
       (4) When about to transmit an ACK or DATA packet, record the Tx timestamp
           before only; don't update the timestamp afterwards.
      
       (5) Switch the ordering between recording the serial and recording the
           timestamp to always set the serial number first.  The serial number
           shouldn't be seen referenced by an ACK packet until we've transmitted
           the packet bearing it - so in the Rx path, we don't need the timestamp
           until we've checked the serial number.
      
      Fixes: cf1a6474 ("rxrpc: Add per-peer RTT tracker")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b604dd98
    • D
      rxrpc: Fix checks as to whether we should set up a new call · dc71db34
      David Howells 提交于
      There's a check in rxrpc_data_ready() that's checking the CLIENT_INITIATED
      flag in the packet type field rather than in the packet flags field.
      
      Fix this by creating a pair of helper functions to check whether the packet
      is going to the client or to the server and use them generally.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      dc71db34
  5. 27 9月, 2018 5 次提交
  6. 26 9月, 2018 5 次提交
  7. 25 9月, 2018 2 次提交
    • P
      ip_tunnel: be careful when accessing the inner header · ccfec9e5
      Paolo Abeni 提交于
      Cong noted that we need the same checks introduced by commit 76c0ddd8
      ("ip6_tunnel: be careful when accessing the inner header")
      even for ipv4 tunnels.
      
      Fixes: c5441932 ("GRE: Refactor GRE tunneling code.")
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccfec9e5
    • S
      mpls: allow routes on ip6gre devices · d8e2262a
      Saif Hasan 提交于
      Summary:
      
      This appears to be necessary and sufficient change to enable `MPLS` on
      `ip6gre` tunnels (RFC4023).
      
      This diff allows IP6GRE devices to be recognized by MPLS kernel module
      and hence user can configure interface to accept packets with mpls
      headers as well setup mpls routes on them.
      
      Test Plan:
      
      Test plan consists of multiple containers connected via GRE-V6 tunnel.
      Then carrying out testing steps as below.
      
      - Carry out necessary sysctl settings on all containers
      
      ```
      sysctl -w net.mpls.platform_labels=65536
      sysctl -w net.mpls.ip_ttl_propagate=1
      sysctl -w net.mpls.conf.lo.input=1
      ```
      
      - Establish IP6GRE tunnels
      
      ```
      ip -6 tunnel add name if_1_2_1 mode ip6gre \
        local 2401:db00:21:6048:feed:0::1 \
        remote 2401:db00:21:6048:feed:0::2 key 1
      ip link set dev if_1_2_1 up
      sysctl -w net.mpls.conf.if_1_2_1.input=1
      ip -4 addr add 169.254.0.2/31 dev if_1_2_1 scope link
      
      ip -6 tunnel add name if_1_3_1 mode ip6gre \
        local 2401:db00:21:6048:feed:0::1 \
        remote 2401:db00:21:6048:feed:0::3 key 1
      ip link set dev if_1_3_1 up
      sysctl -w net.mpls.conf.if_1_3_1.input=1
      ip -4 addr add 169.254.0.4/31 dev if_1_3_1 scope link
      ```
      
      - Install MPLS encap rules on node-1 towards node-2
      
      ```
      ip route add 192.168.0.11/32 nexthop encap mpls 32/64 \
        via inet 169.254.0.3 dev if_1_2_1
      ```
      
      - Install MPLS forwarding rules on node-2 and node-3
      ```
      // node2
      ip -f mpls route add 32 via inet 169.254.0.7 dev if_2_4_1
      
      // node3
      ip -f mpls route add 64 via inet 169.254.0.12 dev if_4_3_1
      ```
      
      - Ping 192.168.0.11 (node4) from 192.168.0.1 (node1) (where routing
        towards 192.168.0.1 is via IP route directly towards node1 from node4)
      ```
      ping 192.168.0.11
      ```
      
      - tcpdump on interface to capture ping packets wrapped within MPLS
        header which inturn wrapped within IP6GRE header
      
      ```
      16:43:41.121073 IP6
        2401:db00:21:6048:feed::1 > 2401:db00:21:6048:feed::2:
        DSTOPT GREv0, key=0x1, length 100:
        MPLS (label 32, exp 0, ttl 255) (label 64, exp 0, [S], ttl 255)
        IP 192.168.0.1 > 192.168.0.11:
        ICMP echo request, id 1208, seq 45, length 64
      
      0x0000:  6000 2cdb 006c 3c3f 2401 db00 0021 6048  `.,..l<?$....!`H
      0x0010:  feed 0000 0000 0001 2401 db00 0021 6048  ........$....!`H
      0x0020:  feed 0000 0000 0002 2f00 0401 0401 0100  ......../.......
      0x0030:  2000 8847 0000 0001 0002 00ff 0004 01ff  ...G............
      0x0040:  4500 0054 3280 4000 ff01 c7cb c0a8 0001  E..T2.@.........
      0x0050:  c0a8 000b 0800 a8d7 04b8 002d 2d3c a05b  ...........--<.[
      0x0060:  0000 0000 bcd8 0100 0000 0000 1011 1213  ................
      0x0070:  1415 1617 1819 1a1b 1c1d 1e1f 2021 2223  .............!"#
      0x0080:  2425 2627 2829 2a2b 2c2d 2e2f 3031 3233  $%&'()*+,-./0123
      0x0090:  3435 3637                                4567
      ```
      Signed-off-by: NSaif Hasan <has@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8e2262a
  8. 24 9月, 2018 2 次提交
    • E
      netpoll: make ndo_poll_controller() optional · ac3d9dd0
      Eric Dumazet 提交于
      As diagnosed by Song Liu, ndo_poll_controller() can
      be very dangerous on loaded hosts, since the cpu
      calling ndo_poll_controller() might steal all NAPI
      contexts (for all RX/TX queues of the NIC). This capture
      can last for unlimited amount of time, since one
      cpu is generally not able to drain all the queues under load.
      
      It seems that all networking drivers that do use NAPI
      for their TX completions, should not provide a ndo_poll_controller().
      
      NAPI drivers have netpoll support already handled
      in core networking stack, since netpoll_poll_dev()
      uses poll_napi(dev) to iterate through registered
      NAPI contexts for a device.
      
      This patch allows netpoll_poll_dev() to process NAPI
      contexts even for drivers not providing ndo_poll_controller(),
      allowing for following patches in NAPI drivers.
      
      Also we export netpoll_poll_dev() so that it can be called
      by bonding/team drivers in following patches.
      Reported-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Tested-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac3d9dd0
    • D
      rds: Fix build regression. · 16fdf8ba
      David S. Miller 提交于
      Use DECLARE_* not DEFINE_*
      
      Fixes: 8360ed67 ("RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16fdf8ba
  9. 23 9月, 2018 1 次提交
  10. 22 9月, 2018 4 次提交
    • N
      RDS: IB: Use DEFINE_PER_CPU_SHARED_ALIGNED for rds_ib_stats · 8360ed67
      Nathan Chancellor 提交于
      Clang warns when two declarations' section attributes don't match.
      
      net/rds/ib_stats.c:40:1: warning: section does not match previous
      declaration [-Wsection]
      DEFINE_PER_CPU_SHARED_ALIGNED(struct rds_ib_statistics, rds_ib_stats);
      ^
      ./include/linux/percpu-defs.h:142:2: note: expanded from macro
      'DEFINE_PER_CPU_SHARED_ALIGNED'
              DEFINE_PER_CPU_SECTION(type, name,
      PER_CPU_SHARED_ALIGNED_SECTION) \
              ^
      ./include/linux/percpu-defs.h:93:9: note: expanded from macro
      'DEFINE_PER_CPU_SECTION'
              extern __PCPU_ATTRS(sec) __typeof__(type) name;
      \
                     ^
      ./include/linux/percpu-defs.h:49:26: note: expanded from macro
      '__PCPU_ATTRS'
              __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
      \
                                      ^
      net/rds/ib.h:446:1: note: previous attribute is here
      DECLARE_PER_CPU(struct rds_ib_statistics, rds_ib_stats);
      ^
      ./include/linux/percpu-defs.h:111:2: note: expanded from macro
      'DECLARE_PER_CPU'
              DECLARE_PER_CPU_SECTION(type, name, "")
              ^
      ./include/linux/percpu-defs.h:87:9: note: expanded from macro
      'DECLARE_PER_CPU_SECTION'
              extern __PCPU_ATTRS(sec) __typeof__(type) name
                     ^
      ./include/linux/percpu-defs.h:49:26: note: expanded from macro
      '__PCPU_ATTRS'
              __percpu __attribute__((section(PER_CPU_BASE_SECTION sec)))
      \
                                      ^
      1 warning generated.
      
      The initial definition was added in commit ec16227e ("RDS/IB:
      Infiniband transport") and the cache aligned definition was added in
      commit e6babe4c ("RDS/IB: Stats and sysctls") right after. The
      definition probably should have been updated in net/rds/ib.h, which is
      what this patch does.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/114Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8360ed67
    • D
      devlink: double free in devlink_resource_fill() · 83fe9a96
      Dan Carpenter 提交于
      Smatch reports that devlink_dpipe_send_and_alloc_skb() frees the skb
      on error so this is a double free.  We fixed a bunch of these bugs in
      commit 7fe4d6dc ("devlink: Remove redundant free on error path") but
      we accidentally overlooked this one.
      
      Fixes: d9f9b9a4 ("devlink: Add support for resource abstraction")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83fe9a96
    • J
      net/ipv6: Display all addresses in output of /proc/net/if_inet6 · 86f9bd1f
      Jeff Barnhill 提交于
      The backend handling for /proc/net/if_inet6 in addrconf.c doesn't properly
      handle starting/stopping the iteration.  The problem is that at some point
      during the iteration, an overflow is detected and the process is
      subsequently stopped.  The item being shown via seq_printf() when the
      overflow occurs is not actually shown, though.  When start() is
      subsequently called to resume iterating, it returns the next item, and
      thus the item that was being processed when the overflow occurred never
      gets printed.
      
      Alter the meaning of the private data member "offset".  Currently, when it
      is not 0 (which only happens at the very beginning), "offset" represents
      the next hlist item to be printed.  After this change, "offset" always
      represents the current item.
      
      This is also consistent with the private data member "bucket", which
      represents the current bucket, and also the use of "pos" as defined in
      seq_file.txt:
          The pos passed to start() will always be either zero, or the most
          recent pos used in the previous session.
      Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86f9bd1f
    • S
      netlabel: check for IPV4MASK in addrinfo_get · f88b4c01
      Sean Tranchetti 提交于
      netlbl_unlabel_addrinfo_get() assumes that if it finds the
      NLBL_UNLABEL_A_IPV4ADDR attribute, it must also have the
      NLBL_UNLABEL_A_IPV4MASK attribute as well. However, this is
      not necessarily the case as the current checks in
      netlbl_unlabel_staticadd() and friends are not sufficent to
      enforce this.
      
      If passed a netlink message with NLBL_UNLABEL_A_IPV4ADDR,
      NLBL_UNLABEL_A_IPV6ADDR, and NLBL_UNLABEL_A_IPV6MASK attributes,
      these functions will all call netlbl_unlabel_addrinfo_get() which
      will then attempt dereference NULL when fetching the non-existent
      NLBL_UNLABEL_A_IPV4MASK attribute:
      
      Unable to handle kernel NULL pointer dereference at virtual address 0
      Process unlab (pid: 31762, stack limit = 0xffffff80502d8000)
      Call trace:
      	netlbl_unlabel_addrinfo_get+0x44/0xd8
      	netlbl_unlabel_staticremovedef+0x98/0xe0
      	genl_rcv_msg+0x354/0x388
      	netlink_rcv_skb+0xac/0x118
      	genl_rcv+0x34/0x48
      	netlink_unicast+0x158/0x1f0
      	netlink_sendmsg+0x32c/0x338
      	sock_sendmsg+0x44/0x60
      	___sys_sendmsg+0x1d0/0x2a8
      	__sys_sendmsg+0x64/0xb4
      	SyS_sendmsg+0x34/0x4c
      	el0_svc_naked+0x34/0x38
      Code: 51001149 7100113f 540000a0 f9401508 (79400108)
      ---[ end trace f6438a488e737143 ]---
      Kernel panic - not syncing: Fatal exception
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88b4c01
  11. 21 9月, 2018 4 次提交
    • X
      sctp: update dst pmtu with the correct daddr · d7ab5cdc
      Xin Long 提交于
      When processing pmtu update from an icmp packet, it calls .update_pmtu
      with sk instead of skb in sctp_transport_update_pmtu.
      
      However for sctp, the daddr in the transport might be different from
      inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or
      create the route cache. The incorrect daddr will cause a different
      route cache created for the path.
      
      So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr
      should be updated with the daddr in the transport, and update it back
      after it's done.
      
      The issue has existed since route exceptions introduction.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Reported-by: ian.periam@dialogic.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7ab5cdc
    • Z
      netfilter: conntrack: get rid of double sizeof · 346fa83d
      zhong jiang 提交于
      sizeof(sizeof()) is quite strange and does not seem to be what
      is wanted here.
      
      The issue is detected with the help of Coccinelle.
      
      Fixes: 39215846 ("netfilter: conntrack: remove nlattr_size pointer from l4proto trackers")
      Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      346fa83d
    • S
      netfilter: nft_osf: use enum nft_data_types for nft_validate_register_store · bab43449
      Stefan Agner 提交于
      The function nft_validate_register_store requires a struct of type
      struct nft_data_types. NFTA_DATA_VALUE is of type enum
      nft_verdict_attributes. Pass the correct enum type.
      
      This fixes a warning seen with Clang:
        net/netfilter/nft_osf.c:52:8: warning: implicit conversion from
          enumeration type 'enum nft_data_attributes' to different enumeration
          type 'enum nft_data_types' [-Wenum-conversion]
                                                NFTA_DATA_VALUE, NFT_OSF_MAXGENRELEN);
                                                ^~~~~~~~~~~~~~~
      
      Fixes: b96af92d ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf")
      Link: https://github.com/ClangBuiltLinux/linux/issues/103Signed-off-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bab43449
    • D
      netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev · a173f066
      David Ahern 提交于
      For starters, the bridge netfilter code registers operations that
      are invoked any time nh_hook is called. Specifically, ip_sabotage_in
      watches for nested calls for NF_INET_PRE_ROUTING when a bridge is in
      the stack.
      
      Packet wise, the bridge netfilter hook runs first. br_nf_pre_routing
      allocates nf_bridge, sets in_prerouting to 1 and calls NF_HOOK for
      NF_INET_PRE_ROUTING. It's finish function, br_nf_pre_routing_finish,
      then resets in_prerouting flag to 0 and the packet continues up the
      stack. The packet eventually makes it to the VRF driver and it invokes
      nf_hook for NF_INET_PRE_ROUTING in case any rules have been added against
      the vrf device.
      
      Because of the registered operations the call to nf_hook causes
      ip_sabotage_in to be invoked. That function sees the nf_bridge on the
      skb and that in_prerouting is not set. Thinking it is an invalid nested
      call it steals (drops) the packet.
      
      Update ip_sabotage_in to recognize that the bridge or one of its upper
      devices (e.g., vlan) can be enslaved to a VRF (L3 master device) and
      allow the packet to go through the nf_hook a second time.
      
      Fixes: 73e20b76 ("net: vrf: Add support for PREROUTING rules on vrf device")
      Reported-by: ND'Souza, Nelson <ndsouza@ciena.com>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a173f066
  12. 20 9月, 2018 3 次提交
    • J
      smc: generic netlink family should be __ro_after_init · 56ce3c5a
      Johannes Berg 提交于
      The generic netlink family is only initialized during module init,
      so it should be __ro_after_init like all other generic netlink
      families.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56ce3c5a
    • S
      xfrm: validate template mode · 32bf94fb
      Sean Tranchetti 提交于
      XFRM mode parameters passed as part of the user templates
      in the IP_XFRM_POLICY are never properly validated. Passing
      values other than valid XFRM modes can cause stack-out-of-bounds
      reads to occur later in the XFRM processing:
      
      [  140.535608] ================================================================
      [  140.543058] BUG: KASAN: stack-out-of-bounds in xfrm_state_find+0x17e4/0x1cc4
      [  140.550306] Read of size 4 at addr ffffffc0238a7a58 by task repro/5148
      [  140.557369]
      [  140.558927] Call trace:
      [  140.558936] dump_backtrace+0x0/0x388
      [  140.558940] show_stack+0x24/0x30
      [  140.558946] __dump_stack+0x24/0x2c
      [  140.558949] dump_stack+0x8c/0xd0
      [  140.558956] print_address_description+0x74/0x234
      [  140.558960] kasan_report+0x240/0x264
      [  140.558963] __asan_report_load4_noabort+0x2c/0x38
      [  140.558967] xfrm_state_find+0x17e4/0x1cc4
      [  140.558971] xfrm_resolve_and_create_bundle+0x40c/0x1fb8
      [  140.558975] xfrm_lookup+0x238/0x1444
      [  140.558977] xfrm_lookup_route+0x48/0x11c
      [  140.558984] ip_route_output_flow+0x88/0xc4
      [  140.558991] raw_sendmsg+0xa74/0x266c
      [  140.558996] inet_sendmsg+0x258/0x3b0
      [  140.559002] sock_sendmsg+0xbc/0xec
      [  140.559005] SyS_sendto+0x3a8/0x5a8
      [  140.559008] el0_svc_naked+0x34/0x38
      [  140.559009]
      [  140.592245] page dumped because: kasan: bad access detected
      [  140.597981] page_owner info is not active (free page?)
      [  140.603267]
      [  140.653503] ================================================================
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      32bf94fb
    • P
      ip6_tunnel: be careful when accessing the inner header · 76c0ddd8
      Paolo Abeni 提交于
      the ip6 tunnel xmit ndo assumes that the processed skb always
      contains an ip[v6] header, but syzbot has found a way to send
      frames that fall short of this assumption, leading to the following splat:
      
      BUG: KMSAN: uninit-value in ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307
      [inline]
      BUG: KMSAN: uninit-value in ip6_tnl_start_xmit+0x7d2/0x1ef0
      net/ipv6/ip6_tunnel.c:1390
      CPU: 0 PID: 4504 Comm: syz-executor558 Not tainted 4.16.0+ #87
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:17 [inline]
        dump_stack+0x185/0x1d0 lib/dump_stack.c:53
        kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
        __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:683
        ip6ip6_tnl_xmit net/ipv6/ip6_tunnel.c:1307 [inline]
        ip6_tnl_start_xmit+0x7d2/0x1ef0 net/ipv6/ip6_tunnel.c:1390
        __netdev_start_xmit include/linux/netdevice.h:4066 [inline]
        netdev_start_xmit include/linux/netdevice.h:4075 [inline]
        xmit_one net/core/dev.c:3026 [inline]
        dev_hard_start_xmit+0x5f1/0xc70 net/core/dev.c:3042
        __dev_queue_xmit+0x27ee/0x3520 net/core/dev.c:3557
        dev_queue_xmit+0x4b/0x60 net/core/dev.c:3590
        packet_snd net/packet/af_packet.c:2944 [inline]
        packet_sendmsg+0x7c70/0x8a30 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
        __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
        SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
        SyS_sendmmsg+0x63/0x90 net/socket.c:2162
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x441819
      RSP: 002b:00007ffe58ee8268 EFLAGS: 00000213 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441819
      RDX: 0000000000000002 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00000000006cd018 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000213 R12: 0000000000402510
      R13: 00000000004025a0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:278 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:188
        kmsan_kmalloc+0x94/0x100 mm/kmsan/kmsan.c:314
        kmsan_slab_alloc+0x11/0x20 mm/kmsan/kmsan.c:321
        slab_post_alloc_hook mm/slab.h:445 [inline]
        slab_alloc_node mm/slub.c:2737 [inline]
        __kmalloc_node_track_caller+0xaed/0x11c0 mm/slub.c:4369
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2cf/0x9f0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:984 [inline]
        alloc_skb_with_frags+0x1d4/0xb20 net/core/skbuff.c:5234
        sock_alloc_send_pskb+0xb56/0x1190 net/core/sock.c:2085
        packet_alloc_skb net/packet/af_packet.c:2803 [inline]
        packet_snd net/packet/af_packet.c:2894 [inline]
        packet_sendmsg+0x6454/0x8a30 net/packet/af_packet.c:2969
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg net/socket.c:640 [inline]
        ___sys_sendmsg+0xec0/0x1310 net/socket.c:2046
        __sys_sendmmsg+0x42d/0x800 net/socket.c:2136
        SYSC_sendmmsg+0xc4/0x110 net/socket.c:2167
        SyS_sendmmsg+0x63/0x90 net/socket.c:2162
        do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      This change addresses the issue adding the needed check before
      accessing the inner header.
      
      The ipv4 side of the issue is apparently there since the ipv4 over ipv6
      initial support, and the ipv6 side predates git history.
      
      Fixes: c4d3efaf ("[IPV6] IP6TUNNEL: Add support to IPv4 over IPv6 tunnel.")
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+3fde91d4d394747d6db4@syzkaller.appspotmail.com
      Tested-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76c0ddd8