1. 27 11月, 2014 2 次提交
  2. 26 11月, 2014 3 次提交
  3. 25 11月, 2014 3 次提交
  4. 24 11月, 2014 2 次提交
    • L
      ip_tunnel: the lack of vti_link_ops' dellink() cause kernel panic · 20ea60ca
      lucien 提交于
      Now the vti_link_ops do not point the .dellink, for fb tunnel device
      (ip_vti0), the net_device will be removed as the default .dellink is
      unregister_netdevice_queue,but the tunnel still in the tunnel list,
      then if we add a new vti tunnel, in ip_tunnel_find():
      
              hlist_for_each_entry_rcu(t, head, hash_node) {
                      if (local == t->parms.iph.saddr &&
                          remote == t->parms.iph.daddr &&
                          link == t->parms.link &&
      ==>                 type == t->dev->type &&
                          ip_tunnel_key_match(&t->parms, flags, key))
                              break;
              }
      
      the panic will happen, cause dev of ip_tunnel *t is null:
      [ 3835.072977] IP: [<ffffffffa04103fd>] ip_tunnel_find+0x9d/0xc0 [ip_tunnel]
      [ 3835.073008] PGD b2c21067 PUD b7277067 PMD 0
      [ 3835.073008] Oops: 0000 [#1] SMP
      .....
      [ 3835.073008] Stack:
      [ 3835.073008]  ffff8800b72d77f0 ffffffffa0411924 ffff8800bb956000 ffff8800b72d78e0
      [ 3835.073008]  ffff8800b72d78a0 0000000000000000 ffffffffa040d100 ffff8800b72d7858
      [ 3835.073008]  ffffffffa040b2e3 0000000000000000 0000000000000000 0000000000000000
      [ 3835.073008] Call Trace:
      [ 3835.073008]  [<ffffffffa0411924>] ip_tunnel_newlink+0x64/0x160 [ip_tunnel]
      [ 3835.073008]  [<ffffffffa040b2e3>] vti_newlink+0x43/0x70 [ip_vti]
      [ 3835.073008]  [<ffffffff8150d4da>] rtnl_newlink+0x4fa/0x5f0
      [ 3835.073008]  [<ffffffff812f68bb>] ? nla_strlcpy+0x5b/0x70
      [ 3835.073008]  [<ffffffff81508fb0>] ? rtnl_link_ops_get+0x40/0x60
      [ 3835.073008]  [<ffffffff8150d11f>] ? rtnl_newlink+0x13f/0x5f0
      [ 3835.073008]  [<ffffffff81509cf4>] rtnetlink_rcv_msg+0xa4/0x270
      [ 3835.073008]  [<ffffffff8126adf5>] ? sock_has_perm+0x75/0x90
      [ 3835.073008]  [<ffffffff81509c50>] ? rtnetlink_rcv+0x30/0x30
      [ 3835.073008]  [<ffffffff81529e39>] netlink_rcv_skb+0xa9/0xc0
      [ 3835.073008]  [<ffffffff81509c48>] rtnetlink_rcv+0x28/0x30
      ....
      
      modprobe ip_vti
      ip link del ip_vti0 type vti
      ip link add ip_vti0 type vti
      rmmod ip_vti
      
      do that one or more times, kernel will panic.
      
      fix it by assigning ip_tunnel_dellink to vti_link_ops' dellink, in
      which we skip the unregister of fb tunnel device. do the same on ip6_vti.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20ea60ca
    • A
      ipv6: Do not treat a GSO_TCPV4 request from UDP tunnel over IPv6 as invalid · b6fef4c6
      Alexander Duyck 提交于
      This patch adds SKB_GSO_TCPV4 to the list of supported GSO types handled by
      the IPv6 GSO offloads.  Without this change VXLAN tunnels running over IPv6
      do not currently handle IPv4 TCP TSO requests correctly and end up handing
      the non-segmented frame off to the device.
      
      Below is the before and after for a simple netperf TCP_STREAM test between
      two endpoints tunneling IPv4 over a VXLAN tunnel running on IPv6 on top of
      a 1Gb/s network adapter.
      
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.29       0.88      Before
       87380  16384  16384    10.03     895.69      After
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6fef4c6
  5. 22 11月, 2014 2 次提交
    • C
      tcp: Restore RFC5961-compliant behavior for SYN packets · 0c228e83
      Calvin Owens 提交于
      Commit c3ae62af ("tcp: should drop incoming frames without ACK
      flag set") was created to mitigate a security vulnerability in which a
      local attacker is able to inject data into locally-opened sockets by
      using TCP protocol statistics in procfs to quickly find the correct
      sequence number.
      
      This broke the RFC5961 requirement to send a challenge ACK in response
      to spurious RST packets, which was subsequently fixed by commit
      7b514a88 ("tcp: accept RST without ACK flag").
      
      Unfortunately, the RFC5961 requirement that spurious SYN packets be
      handled in a similar manner remains broken.
      
      RFC5961 section 4 states that:
      
         ... the handling of the SYN in the synchronized state SHOULD be
         performed as follows:
      
         1) If the SYN bit is set, irrespective of the sequence number, TCP
            MUST send an ACK (also referred to as challenge ACK) to the remote
            peer:
      
            <SEQ=SND.NXT><ACK=RCV.NXT><CTL=ACK>
      
            After sending the acknowledgment, TCP MUST drop the unacceptable
            segment and stop processing further.
      
         By sending an ACK, the remote peer is challenged to confirm the loss
         of the previous connection and the request to start a new connection.
         A legitimate peer, after restart, would not have a TCB in the
         synchronized state.  Thus, when the ACK arrives, the peer should send
         a RST segment back with the sequence number derived from the ACK
         field that caused the RST.
      
         This RST will confirm that the remote peer has indeed closed the
         previous connection.  Upon receipt of a valid RST, the local TCP
         endpoint MUST terminate its connection.  The local TCP endpoint
         should then rely on SYN retransmission from the remote end to
         re-establish the connection.
      
      This patch lets SYN packets through the discard added in c3ae62af,
      so that spurious SYN packets are properly dealt with as per the RFC.
      
      The challenge ACK is sent unconditionally and is rate-limited, so the
      original vulnerability is not reintroduced by this patch.
      Signed-off-by: NCalvin Owens <calvinowens@fb.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c228e83
    • E
      net: Revert "net: avoid one atomic operation in skb_clone()" · e7820e39
      Eric Dumazet 提交于
      Not sure what I was thinking, but doing anything after
      releasing a refcount is suicidal or/and embarrassing.
      
      By the time we set skb->fclone to SKB_FCLONE_FREE, another cpu
      could have released last reference and freed whole skb.
      
      We potentially corrupt memory or trap if CONFIG_DEBUG_PAGEALLOC is set.
      Reported-by: NChris Mason <clm@fb.com>
      Fixes: ce1a4ea3 ("net: avoid one atomic operation in skb_clone()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Sabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7820e39
  6. 21 11月, 2014 2 次提交
    • J
      ipx: fix locking regression in ipx_sendmsg and ipx_recvmsg · 01462405
      Jiri Bohac 提交于
      This fixes an old regression introduced by commit
      b0d0d915 (ipx: remove the BKL).
      
      When a recvmsg syscall blocks waiting for new data, no data can be sent on the
      same socket with sendmsg because ipx_recvmsg() sleeps with the socket locked.
      
      This breaks mars-nwe (NetWare emulator):
      - the ncpserv process reads the request using recvmsg
      - ncpserv forks and spawns nwconn
      - ncpserv calls a (blocking) recvmsg and waits for new requests
      - nwconn deadlocks in sendmsg on the same socket
      
      Commit b0d0d915 has simply replaced BKL locking with
      lock_sock/release_sock. Unlike now, BKL got unlocked while
      sleeping, so a blocking recvmsg did not block a concurrent
      sendmsg.
      
      Only keep the socket locked while actually working with the socket data and
      release it prior to calling skb_recv_datagram().
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Reviewed-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01462405
    • J
      openvswitch: Don't validate IPv6 label masks. · d3052bb5
      Joe Stringer 提交于
      When userspace doesn't provide a mask, OVS datapath generates a fully
      unwildcarded mask for the flow by copying the flow and setting all bits
      in all fields. For IPv6 label, this creates a mask that matches on the
      upper 12 bits, causing the following error:
      
      openvswitch: netlink: Invalid IPv6 flow label value (value=ffffffff, max=fffff)
      
      This patch ignores the label validation check for masks, avoiding this
      error.
      Signed-off-by: NJoe Stringer <joestringer@nicira.com>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3052bb5
  7. 20 11月, 2014 1 次提交
  8. 19 11月, 2014 1 次提交
  9. 17 11月, 2014 5 次提交
    • L
      bridge: fix netfilter/NF_BR_LOCAL_OUT for own, locally generated queries · f0b4eece
      Linus Lüssing 提交于
      Ebtables on the OUTPUT chain (NF_BR_LOCAL_OUT) would not work as expected
      for both locally generated IGMP and MLD queries. The IP header specific
      filter options are off by 14 Bytes for netfilter (actual output on
      interfaces is fine).
      
      NF_HOOK() expects the skb->data to point to the IP header, not the
      ethernet one (while dev_queue_xmit() does not). Luckily there is an
      br_dev_queue_push_xmit() helper function already - let's just use that.
      
      Introduced by eb1d1641
      ("bridge: Add core IGMP snooping support")
      
      Ebtables example:
      
      $ ebtables -I OUTPUT -p IPv6 -o eth1 --logical-out br0 \
      	--log --log-level 6 --log-ip6 --log-prefix="~EBT: " -j DROP
      
      before (broken):
      
      ~EBT:  IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
      	MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
      	SRC=64a4:39c2:86dd:6000:0000:0020:0001:fe80 IPv6 \
      	DST=0000:0000:0000:0004:64ff:fea4:39c2:ff02, \
      	IPv6 priority=0x3, Next Header=2
      
      after (working):
      
      ~EBT:  IN= OUT=eth1 MAC source = 02:04:64:a4:39:c2 \
      	MAC dest = 33:33:00:00:00:01 proto = 0x86dd IPv6 \
      	SRC=fe80:0000:0000:0000:0004:64ff:fea4:39c2 IPv6 \
      	DST=ff02:0000:0000:0000:0000:0000:0000:0001, \
      	IPv6 priority=0x0, Next Header=0
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f0b4eece
    • P
      netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind · 97840cb6
      Pablo Neira Ayuso 提交于
      Make sure the netlink group exists, otherwise you can trigger an out
      of bound array memory access from the netlink_bind() path. This splat
      can only be triggered only by superuser.
      
      [  180.203600] UBSan: Undefined behaviour in ../net/netfilter/nfnetlink.c:467:28
      [  180.204249] index 9 is out of range for type 'int [9]'
      [  180.204697] CPU: 0 PID: 1771 Comm: trinity-main Not tainted 3.18.0-rc4-mm1+ #122
      [  180.205365] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org
      +04/01/2014
      [  180.206498]  0000000000000018 0000000000000000 0000000000000009 ffff88007bdf7da8
      [  180.207220]  ffffffff82b0ef5f 0000000000000092 ffffffff845ae2e0 ffff88007bdf7db8
      [  180.207887]  ffffffff8199e489 ffff88007bdf7e18 ffffffff8199ea22 0000003900000000
      [  180.208639] Call Trace:
      [  180.208857] dump_stack (lib/dump_stack.c:52)
      [  180.209370] ubsan_epilogue (lib/ubsan.c:174)
      [  180.209849] __ubsan_handle_out_of_bounds (lib/ubsan.c:400)
      [  180.210512] nfnetlink_bind (net/netfilter/nfnetlink.c:467)
      [  180.210986] netlink_bind (net/netlink/af_netlink.c:1483)
      [  180.211495] SYSC_bind (net/socket.c:1541)
      
      Moreover, define the missing nf_tables and nf_acct multicast groups too.
      Reported-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      97840cb6
    • D
      ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs · feb91a02
      Daniel Borkmann 提交于
      It has been reported that generating an MLD listener report on
      devices with large MTUs (e.g. 9000) and a high number of IPv6
      addresses can trigger a skb_over_panic():
      
      skbuff: skb_over_panic: text:ffffffff80612a5d len:3776 put:20
      head:ffff88046d751000 data:ffff88046d751010 tail:0xed0 end:0xec0
      dev:port1
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:100!
      invalid opcode: 0000 [#1] SMP
      Modules linked in: ixgbe(O)
      CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 3.14.23+ #4
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff80578226>] ? skb_put+0x3a/0x3b
       [<ffffffff80612a5d>] ? add_grhead+0x45/0x8e
       [<ffffffff80612e3a>] ? add_grec+0x394/0x3d4
       [<ffffffff80613222>] ? mld_ifc_timer_expire+0x195/0x20d
       [<ffffffff8061308d>] ? mld_dad_timer_expire+0x45/0x45
       [<ffffffff80255b5d>] ? call_timer_fn.isra.29+0x12/0x68
       [<ffffffff80255d16>] ? run_timer_softirq+0x163/0x182
       [<ffffffff80250e6f>] ? __do_softirq+0xe0/0x21d
       [<ffffffff8025112b>] ? irq_exit+0x4e/0xd3
       [<ffffffff802214bb>] ? smp_apic_timer_interrupt+0x3b/0x46
       [<ffffffff8063f10a>] ? apic_timer_interrupt+0x6a/0x70
      
      mld_newpack() skb allocations are usually requested with dev->mtu
      in size, since commit 72e09ad1 ("ipv6: avoid high order allocations")
      we have changed the limit in order to be less likely to fail.
      
      However, in MLD/IGMP code, we have some rather ugly AVAILABLE(skb)
      macros, which determine if we may end up doing an skb_put() for
      adding another record. To avoid possible fragmentation, we check
      the skb's tailroom as skb->dev->mtu - skb->len, which is a wrong
      assumption as the actual max allocation size can be much smaller.
      
      The IGMP case doesn't have this issue as commit 57e1ab6e
      ("igmp: refine skb allocations") stores the allocation size in
      the cb[].
      
      Set a reserved_tailroom to make it fit into the MTU and use
      skb_availroom() helper instead. This also allows to get rid of
      igmp_skb_size().
      Reported-by: NWei Liu <lw1a2.jing@gmail.com>
      Fixes: 72e09ad1 ("ipv6: avoid high order allocations")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: David L Stevens <david.stevens@oracle.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feb91a02
    • A
      dcbnl : Disable software interrupts before taking dcb_lock · 52cff74e
      Anish Bhatt 提交于
      Solves possible lockup issues that can be seen from firmware DCB agents calling
      into the DCB app api.
      
      DCB firmware event queues can be tied in with NAPI so that dcb events are
      generated in softIRQ context. This can results in calls to dcb_*app()
      functions which try to take the dcb_lock.
      
      If the the event triggers while we also have the dcb_lock because lldpad or
      some other agent happened to be issuing a  get/set command we could see a cpu
      lockup.
      
      This code was not originally written with firmware agents in mind, hence
      grabbing dcb_lock from softIRQ context was not considered.
      Signed-off-by: NAnish Bhatt <anish@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52cff74e
    • P
      ipv4: Fix incorrect error code when adding an unreachable route · 49dd18ba
      Panu Matilainen 提交于
      Trying to add an unreachable route incorrectly returns -ESRCH if
      if custom FIB rules are present:
      
      [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
      RTNETLINK answers: Network is unreachable
      [root@localhost ~]# ip rule add to 55.66.77.88 table 200
      [root@localhost ~]# ip route add 74.125.31.199 dev eth0 via 1.2.3.4
      RTNETLINK answers: No such process
      [root@localhost ~]#
      
      Commit 83886b6b ("[NET]: Change "not found"
      return value for rule lookup") changed fib_rules_lookup()
      to use -ESRCH as a "not found" code internally, but for user space it
      should be translated into -ENETUNREACH. Handle the translation centrally in
      ipv4-specific fib_lookup(), leaving the DECnet case alone.
      
      On a related note, commit b7a71b51
      ("ipv4: removed redundant conditional") removed a similar translation from
      ip_route_input_slow() prematurely AIUI.
      
      Fixes: b7a71b51 ("ipv4: removed redundant conditional")
      Signed-off-by: NPanu Matilainen <pmatilai@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49dd18ba
  10. 15 11月, 2014 6 次提交
  11. 14 11月, 2014 5 次提交
    • I
      libceph: change from BUG to WARN for __remove_osd() asserts · cc9f1f51
      Ilya Dryomov 提交于
      No reason to use BUG_ON for osd request list assertions.
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      cc9f1f51
    • I
      libceph: clear r_req_lru_item in __unregister_linger_request() · ba9d114e
      Ilya Dryomov 提交于
      kick_requests() can put linger requests on the notarget list.  This
      means we need to clear the much-overloaded req->r_req_lru_item in
      __unregister_linger_request() as well, or we get an assertion failure
      in ceph_osdc_release_request() - !list_empty(&req->r_req_lru_item).
      
      AFAICT the assumption was that registered linger requests cannot be on
      any of req->r_req_lru_item lists, but that's clearly not the case.
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      ba9d114e
    • I
      libceph: unlink from o_linger_requests when clearing r_osd · a390de02
      Ilya Dryomov 提交于
      Requests have to be unlinked from both osd->o_requests (normal
      requests) and osd->o_linger_requests (linger requests) lists when
      clearing req->r_osd.  Otherwise __unregister_linger_request() gets
      confused and we trip over a !list_empty(&osd->o_linger_requests)
      assert in __remove_osd().
      
      MON=1 OSD=1:
      
          # cat remove-osd.sh
          #!/bin/bash
          rbd create --size 1 test
          DEV=$(rbd map test)
          ceph osd out 0
          sleep 3
          rbd map dne/dne # obtain a new osdmap as a side effect
          rbd unmap $DEV & # will block
          sleep 3
          ceph osd in 0
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      a390de02
    • I
      libceph: do not crash on large auth tickets · aaef3170
      Ilya Dryomov 提交于
      Large (greater than 32k, the value of PAGE_ALLOC_COSTLY_ORDER) auth
      tickets will have their buffers vmalloc'ed, which leads to the
      following crash in crypto:
      
      [   28.685082] BUG: unable to handle kernel paging request at ffffeb04000032c0
      [   28.686032] IP: [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
      [   28.686032] PGD 0
      [   28.688088] Oops: 0000 [#1] PREEMPT SMP
      [   28.688088] Modules linked in:
      [   28.688088] CPU: 0 PID: 878 Comm: kworker/0:2 Not tainted 3.17.0-vm+ #305
      [   28.688088] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [   28.688088] Workqueue: ceph-msgr con_work
      [   28.688088] task: ffff88011a7f9030 ti: ffff8800d903c000 task.ti: ffff8800d903c000
      [   28.688088] RIP: 0010:[<ffffffff81392b42>]  [<ffffffff81392b42>] scatterwalk_pagedone+0x22/0x80
      [   28.688088] RSP: 0018:ffff8800d903f688  EFLAGS: 00010286
      [   28.688088] RAX: ffffeb04000032c0 RBX: ffff8800d903f718 RCX: ffffeb04000032c0
      [   28.688088] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8800d903f750
      [   28.688088] RBP: ffff8800d903f688 R08: 00000000000007de R09: ffff8800d903f880
      [   28.688088] R10: 18df467c72d6257b R11: 0000000000000000 R12: 0000000000000010
      [   28.688088] R13: ffff8800d903f750 R14: ffff8800d903f8a0 R15: 0000000000000000
      [   28.688088] FS:  00007f50a41c7700(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
      [   28.688088] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [   28.688088] CR2: ffffeb04000032c0 CR3: 00000000da3f3000 CR4: 00000000000006b0
      [   28.688088] Stack:
      [   28.688088]  ffff8800d903f698 ffffffff81392ca8 ffff8800d903f6e8 ffffffff81395d32
      [   28.688088]  ffff8800dac96000 ffff880000000000 ffff8800d903f980 ffff880119b7e020
      [   28.688088]  ffff880119b7e010 0000000000000000 0000000000000010 0000000000000010
      [   28.688088] Call Trace:
      [   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
      [   28.688088]  [<ffffffff81392ca8>] scatterwalk_done+0x38/0x40
      [   28.688088]  [<ffffffff81395d32>] blkcipher_walk_done+0x182/0x220
      [   28.688088]  [<ffffffff813990bf>] crypto_cbc_encrypt+0x15f/0x180
      [   28.688088]  [<ffffffff81399780>] ? crypto_aes_set_key+0x30/0x30
      [   28.688088]  [<ffffffff8156c40c>] ceph_aes_encrypt2+0x29c/0x2e0
      [   28.688088]  [<ffffffff8156d2a3>] ceph_encrypt2+0x93/0xb0
      [   28.688088]  [<ffffffff8156d7da>] ceph_x_encrypt+0x4a/0x60
      [   28.688088]  [<ffffffff8155b39d>] ? ceph_buffer_new+0x5d/0xf0
      [   28.688088]  [<ffffffff8156e837>] ceph_x_build_authorizer.isra.6+0x297/0x360
      [   28.688088]  [<ffffffff8112089b>] ? kmem_cache_alloc_trace+0x11b/0x1c0
      [   28.688088]  [<ffffffff8156b496>] ? ceph_auth_create_authorizer+0x36/0x80
      [   28.688088]  [<ffffffff8156ed83>] ceph_x_create_authorizer+0x63/0xd0
      [   28.688088]  [<ffffffff8156b4b4>] ceph_auth_create_authorizer+0x54/0x80
      [   28.688088]  [<ffffffff8155f7c0>] get_authorizer+0x80/0xd0
      [   28.688088]  [<ffffffff81555a8b>] prepare_write_connect+0x18b/0x2b0
      [   28.688088]  [<ffffffff81559289>] try_read+0x1e59/0x1f10
      
      This is because we set up crypto scatterlists as if all buffers were
      kmalloc'ed.  Fix it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NIlya Dryomov <idryomov@redhat.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      aaef3170
    • J
      sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor · b3ecba09
      Jeff Layton 提交于
      Bruce reported that he was seeing the following BUG pop:
      
          BUG: sleeping function called from invalid context at mm/slab.c:2846
          in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
          2 locks held by mount.nfs/4539:
          #0:  (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
          #1:  (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
          Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f
      
          CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e99 #3393
          Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
          ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
          0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
          0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
          Call Trace:
          [<ffffffff81a534cf>] dump_stack+0x4f/0x7c
          [<ffffffff81097854>] __might_sleep+0x114/0x180
          [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280
          [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
          [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
          [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
          [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
          [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
          [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
          [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
          [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
          [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4]
          [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs]
          [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs]
          [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4]
          [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
          [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4]
          [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4]
          [<ffffffff81196489>] mount_fs+0x39/0x1b0
          [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
          [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
          [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
          [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4]
          [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs]
          [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs]
          [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10
          [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs]
          [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs]
          [<ffffffff81196489>] mount_fs+0x39/0x1b0
          [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
          [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
          [<ffffffff811b5830>] do_mount+0x210/0xbe0
          [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160
          [<ffffffff811b651f>] SyS_mount+0x6f/0xb0
          [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17
      
      Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
      the rcu_read_lock before doing the allocation and then reacquiring it
      and redoing the dereference before doing the copy. If we find that the
      string has somehow grown in the meantime, we'll reallocate and try again.
      
      Cc: <stable@vger.kernel.org> # v3.17+
      Reported-by: N"J. Bruce Fields" <bfields@fieldses.org>
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      b3ecba09
  12. 13 11月, 2014 1 次提交
  13. 12 11月, 2014 7 次提交
    • P
      netfilter: nf_tables: restore synchronous object release from commit/abort · b326dd37
      Pablo Neira Ayuso 提交于
      The existing xtables matches and targets, when used from nft_compat, may
      sleep from the destroy path, ie. when removing rules. Since the objects
      are released via call_rcu from softirq context, this results in lockdep
      splats and possible lockups that may be hard to reproduce.
      
      Patrick also indicated that delayed object release via call_rcu can
      cause us problems in the ordering of event notifications when anonymous
      sets are in place.
      
      So, this patch restores the synchronous object release from the commit
      and abort paths. This includes a call to synchronize_rcu() to make sure
      that no packets are walking on the objects that are going to be
      released. This is slowier though, but it's simple and it resolves the
      aforementioned problems.
      
      This is a partial revert of c7c32e72 ("netfilter: nf_tables: defer all
      object release via rcu") that was introduced in 3.16 to speed up
      interaction with userspace.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b326dd37
    • P
      netfilter: nft_compat: use the match->table to validate dependencies · afefb6f9
      Pablo Neira Ayuso 提交于
      Instead of the match->name, which is of course not relevant.
      
      Fixes: f3f5dded ("netfilter: nft_compat: validate chain type in match/target")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      afefb6f9
    • P
      netfilter: nft_compat: relax chain type validation · c918687f
      Pablo Neira Ayuso 提交于
      Check for nat chain dependency only, which is the one that can
      actually crash the kernel. Don't care if mangle, filter and security
      specific match and targets are used out of their scope, they are
      harmless.
      
      This restores iptables-compat with mangle specific match/target when
      used out of the OUTPUT chain, that are actually emulated through filter
      chains, which broke when performing strict validation.
      
      Fixes: f3f5dded ("netfilter: nft_compat: validate chain type in match/target")
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c918687f
    • P
      netfilter: nft_compat: use current net namespace · 2daf1b4d
      Pablo Neira Ayuso 提交于
      Instead of init_net when using xtables over nftables compat.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2daf1b4d
    • C
      ipvs: Keep skb->sk when allocating headroom on tunnel xmit · 50656d9d
      Calvin Owens 提交于
      ip_vs_prepare_tunneled_skb() ignores ->sk when allocating a new
      skb, either unconditionally setting ->sk to NULL or allowing
      the uninitialized ->sk from a newly allocated skb to leak through
      to the caller.
      
      This patch properly copies ->sk and increments its reference count.
      Signed-off-by: NCalvin Owens <calvinowens@fb.com>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      50656d9d
    • E
      ipv6: fix IPV6_PKTINFO with v4 mapped · 5337b5b7
      Eric Dumazet 提交于
      Use IS_ENABLED(CONFIG_IPV6), to enable this code if IPv6 is
      a module.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: c8e6ad08 ("ipv6: honor IPV6_PKTINFO with v4 mapped addresses on sendmsg")
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5337b5b7
    • D
      net: sctp: fix memory leak in auth key management · 4184b2a7
      Daniel Borkmann 提交于
      A very minimal and simple user space application allocating an SCTP
      socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
      the socket again will leak the memory containing the authentication
      key from user space:
      
      unreferenced object 0xffff8800837047c0 (size 16):
        comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
        hex dump (first 16 bytes):
          01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
          [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
          [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
          [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
          [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
          [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
          [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This is bad because of two things, we can bring down a machine from
      user space when auth_enable=1, but also we would leave security sensitive
      keying material in memory without clearing it after use. The issue is
      that sctp_auth_create_key() already sets the refcount to 1, but after
      allocation sctp_auth_set_key() does an additional refcount on it, and
      thus leaving it around when we free the socket.
      
      Fixes: 65b07e5d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4184b2a7