1. 27 9月, 2016 4 次提交
    • J
      nfsd: add a LRU list for blocked locks · 7919d0a2
      Jeff Layton 提交于
      It's possible for a client to call in on a lock that is blocked for a
      long time, but discontinue polling for it. A malicious client could
      even set a lock on a file, and then spam the server with failing lock
      requests from different lockowners that pile up in a DoS attack.
      
      Add the blocked lock structures to a per-net namespace LRU when hashing
      them, and timestamp them. If the lock request is not revisited after a
      lease period, we'll drop it under the assumption that the client is no
      longer interested.
      
      This also gives us a mechanism to clean up these objects at server
      shutdown time as well.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      7919d0a2
    • J
      nfsd: have nfsd4_lock use blocking locks for v4.1+ locks · 76d348fa
      Jeff Layton 提交于
      Create a new per-lockowner+per-inode structure that contains a
      file_lock. Have nfsd4_lock add this structure to the lockowner's list
      prior to setting the lock. Then call the vfs and request a blocking lock
      (by setting FL_SLEEP). If we get anything besides FILE_LOCK_DEFERRED
      back, then we dequeue the block structure and free it. When the next
      lock request comes in, we'll look for an existing block for the same
      filehandle and dequeue and reuse it if there is one.
      
      When the lock comes free (a'la an lm_notify call), we dequeue it
      from the lockowner's list and kick off a CB_NOTIFY_LOCK callback to
      inform the client that it should retry the lock request.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      76d348fa
    • J
      nfsd: plumb in a CB_NOTIFY_LOCK operation · a188620e
      Jeff Layton 提交于
      Add the encoding/decoding for CB_NOTIFY_LOCK operations.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a188620e
    • V
      NFSD: fix corruption in notifier registration · 1eca45f8
      Vasily Averin 提交于
      By design notifier can be registered once only, however nfsd registers
      the same inetaddr notifiers per net-namespace.  When this happen it
      corrupts list of notifiers, as result some notifiers can be not called
      on proper event, traverse on list can be cycled forever, and second
      unregister can access already freed memory.
      
      Cc: stable@vger.kernel.org
      fixes: 36684996 ("nfsd: Register callbacks on the inetaddr_chain and inet6addr_chain")
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1eca45f8
  2. 23 9月, 2016 6 次提交
    • C
      svcrdma: support Remote Invalidation · 25d55296
      Chuck Lever 提交于
      Support Remote Invalidation. A private message is exchanged with
      the client upon RDMA transport connect that indicates whether
      Send With Invalidation may be used by the server to send RPC
      replies. The invalidate_rkey is arbitrarily chosen from among
      rkeys present in the RPC-over-RDMA header's chunk lists.
      
      Send With Invalidate improves performance only when clients can
      recognize, while processing an RPC reply, that an rkey has already
      been invalidated. That has been submitted as a separate change.
      
      In the future, the RPC-over-RDMA protocol might support Remote
      Invalidation properly. The protocol needs to enable signaling
      between peers to indicate when Remote Invalidation can be used
      for each individual RPC.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      25d55296
    • C
      svcrdma: Server-side support for rpcrdma_connect_private · cc9d8340
      Chuck Lever 提交于
      Prepare to receive an RDMA-CM private message when handling a new
      connection attempt, and send a similar message as part of connection
      acceptance.
      
      Both sides can communicate their various implementation limits.
      Implementations that don't support this sideband protocol ignore it.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cc9d8340
    • C
      rpcrdma: RDMA/CM private message data structure · 5d487096
      Chuck Lever 提交于
      Introduce data structure used by both client and server to exchange
      implementation details during RDMA/CM connection establishment.
      
      This is an experimental out-of-band exchange between Linux
      RPC-over-RDMA Version One implementations, replacing the deprecated
      CCP (see RFC 5666bis). The purpose of this extension is to enable
      prototyping of features that might be introduced in a subsequent
      version of RPC-over-RDMA.
      
      Suggested by Christoph Hellwig and Devesh Sharma.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5d487096
    • C
      svcrdma: Skip put_page() when send_reply() fails · 9995237b
      Chuck Lever 提交于
      Message from syslogd@klimt at Aug 18 17:00:37 ...
       kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping:          (null) index:0x0
      Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
      Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      
      Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
      Aug 18 17:00:37 klimt kernel: RIP: 0010:[<ffffffffa05c21c1>] svc_rdma_sendto+0x641/0x820 [rpcrdma]
      
      send_reply() assigns its page argument as the first page of ctxt. On
      error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
      which does a put_page() on that very page. No need to do that again
      as svc_rdma_sendto exits.
      
      Fixes: 3e1eeb98 ("svcrdma: Close connection when a send error occurs")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9995237b
    • C
      svcrdma: Tail iovec leaves an orphaned DMA mapping · cace564f
      Chuck Lever 提交于
      The ctxt's count field is overloaded to mean the number of pages in
      the ctxt->page array and the number of SGEs in the ctxt->sge array.
      Typically these two numbers are the same.
      
      However, when an inline RPC reply is constructed from an xdr_buf
      with a tail iovec, the head and tail often occupy the same page,
      but each are DMA mapped independently. In that case, ->count equals
      the number of pages, but it does not equal the number of SGEs.
      There's one more SGE, for the tail iovec. Hence there is one more
      DMA mapping than there are pages in the ctxt->page array.
      
      This isn't a real problem until the server's iommu is enabled. Then
      each RPC reply that has content in that iovec orphans a DMA mapping
      that consists of real resources.
      
      krb5i and krb5p always populate that tail iovec. After a couple
      million sent krb5i/p RPC replies, the NFS server starts behaving
      erratically. Reboot is needed to clear the problem.
      
      Fixes: 9d11b51c ("svcrdma: Fix send_reply() scatter/gather set-up")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cace564f
    • J
      nfsd: fix dprintk in nfsd4_encode_getdeviceinfo · bec782b4
      Jeff Layton 提交于
      nfserr is big-endian, so we should convert it to host-endian before
      printing it.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bec782b4
  3. 17 9月, 2016 2 次提交
    • J
      nfsd: eliminate cb_minorversion field · 89dfdc96
      Jeff Layton 提交于
      We already have that info in the client pointer. No need to pass around
      a copy.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      89dfdc96
    • J
      nfsd: don't set a FL_LAYOUT lease for flexfiles layouts · 1983a66f
      Jeff Layton 提交于
      We currently can hit a deadlock (of sorts) when trying to use flexfiles
      layouts with XFS. XFS will call break_layout when something wants to
      write to the file. In the case of the (super-simple) flexfiles layout
      driver in knfsd, the MDS and DS are the same machine.
      
      The client can get a layout and then issue a v3 write to do its I/O. XFS
      will then call xfs_break_layouts, which will cause a CB_LAYOUTRECALL to
      be issued to the client. The client however can't return the layout
      until the v3 WRITE completes, but XFS won't allow the write to proceed
      until the layout is returned.
      
      Christoph says:
      
          XFS only cares about block-like layouts where the client has direct
          access to the file blocks.  I'd need to look how to propagate the
          flag into break_layout, but in principle we don't need to do any
          recalls on truncate ever for file and flexfile layouts.
      
      If we're never going to recall the layout, then we don't even need to
      set the lease at all. Just skip doing so on flexfiles layouts by
      adding a new flag to struct nfsd4_layout_ops and skipping the lease
      setting and removal when that flag is true.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1983a66f
  4. 13 9月, 2016 1 次提交
    • C
      svcauth_gss: Revert 64c59a37 ("Remove unnecessary allocation") · bf2c4b6f
      Chuck Lever 提交于
      rsc_lookup steals the passed-in memory to avoid doing an allocation of
      its own, so we can't just pass in a pointer to memory that someone else
      is using.
      
      If we really want to avoid allocation there then maybe we should
      preallocate somwhere, or reference count these handles.
      
      For now we should revert.
      
      On occasion I see this on my server:
      
      kernel: kernel BUG at /home/cel/src/linux/linux-2.6/mm/slub.c:3851!
      kernel: invalid opcode: 0000 [#1] SMP
      kernel: Modules linked in: cts rpcsec_gss_krb5 sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd btrfs xor iTCO_wdt iTCO_vendor_support raid6_pq pcspkr i2c_i801 i2c_smbus lpc_ich mfd_core mei_me sg mei shpchp wmi ioatdma ipmi_si ipmi_msghandler acpi_pad acpi_power_meter rpcrdma ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm nfsd nfs_acl lockd grace auth_rpcgss sunrpc ip_tables xfs libcrc32c mlx4_ib mlx4_en ib_core sr_mod cdrom sd_mod ast drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel igb mlx4_core ahci libahci libata ptp pps_core dca i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log dm_mod
      kernel: CPU: 7 PID: 145 Comm: kworker/7:2 Not tainted 4.8.0-rc4-00006-g9d06b0b #15
      kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
      kernel: Workqueue: events do_cache_clean [sunrpc]
      kernel: task: ffff8808541d8000 task.stack: ffff880854344000
      kernel: RIP: 0010:[<ffffffff811e7075>]  [<ffffffff811e7075>] kfree+0x155/0x180
      kernel: RSP: 0018:ffff880854347d70  EFLAGS: 00010246
      kernel: RAX: ffffea0020fe7660 RBX: ffff88083f9db064 RCX: 146ff0f9d5ec5600
      kernel: RDX: 000077ff80000000 RSI: ffff880853f01500 RDI: ffff88083f9db064
      kernel: RBP: ffff880854347d88 R08: ffff8808594ee000 R09: ffff88087fdd8780
      kernel: R10: 0000000000000000 R11: ffffea0020fe76c0 R12: ffff880853f01500
      kernel: R13: ffffffffa013cf76 R14: ffffffffa013cff0 R15: ffffffffa04253a0
      kernel: FS:  0000000000000000(0000) GS:ffff88087fdc0000(0000) knlGS:0000000000000000
      kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      kernel: CR2: 00007fed60b020c3 CR3: 0000000001c06000 CR4: 00000000001406e0
      kernel: Stack:
      kernel: ffff8808589f2f00 ffff880853f01500 0000000000000001 ffff880854347da0
      kernel: ffffffffa013cf76 ffff8808589f2f00 ffff880854347db8 ffffffffa013d006
      kernel: ffff8808589f2f20 ffff880854347e00 ffffffffa0406f60 0000000057c7044f
      kernel: Call Trace:
      kernel: [<ffffffffa013cf76>] rsc_free+0x16/0x90 [auth_rpcgss]
      kernel: [<ffffffffa013d006>] rsc_put+0x16/0x30 [auth_rpcgss]
      kernel: [<ffffffffa0406f60>] cache_clean+0x2e0/0x300 [sunrpc]
      kernel: [<ffffffffa04073ee>] do_cache_clean+0xe/0x70 [sunrpc]
      kernel: [<ffffffff8109a70f>] process_one_work+0x1ff/0x3b0
      kernel: [<ffffffff8109b15c>] worker_thread+0x2bc/0x4a0
      kernel: [<ffffffff8109aea0>] ? rescuer_thread+0x3a0/0x3a0
      kernel: [<ffffffff810a0ba4>] kthread+0xe4/0xf0
      kernel: [<ffffffff8169c47f>] ret_from_fork+0x1f/0x40
      kernel: [<ffffffff810a0ac0>] ? kthread_stop+0x110/0x110
      kernel: Code: f7 ff ff eb 3b 65 8b 05 da 30 e2 7e 89 c0 48 0f a3 05 a0 38 b8 00 0f 92 c0 84 c0 0f 85 d1 fe ff ff 0f 1f 44 00 00 e9 f5 fe ff ff <0f> 0b 49 8b 03 31 f6 f6 c4 40 0f 85 62 ff ff ff e9 61 ff ff ff
      kernel: RIP  [<ffffffff811e7075>] kfree+0x155/0x180
      kernel: RSP <ffff880854347d70>
      kernel: ---[ end trace 3fdec044969def26 ]---
      
      It seems to be most common after a server reboot where a client has been
      using a Kerberos mount, and reconnects to continue its workload.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bf2c4b6f
  5. 12 9月, 2016 4 次提交
    • L
      Linux 4.8-rc6 · 9395452b
      Linus Torvalds 提交于
      9395452b
    • L
      nvme: make NVME_RDMA depend on BLOCK · bd0b841f
      Linus Torvalds 提交于
      Commit aa719874 ("nvme: fabrics drivers don't need the nvme-pci
      driver") removed the dependency on BLK_DEV_NVME, but the cdoe does
      depend on the block layer (which used to be an implicit dependency
      through BLK_DEV_NVME).
      
      Otherwise you get various errors from the kbuild test robot random
      config testing when that happens to hit a configuration with BLOCK
      device support disabled.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jay Freyensee <james_p_freyensee@linux.intel.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd0b841f
    • L
      Merge tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging · 2afe669a
      Linus Torvalds 提交于
      Pull IIO fixes from Greg KH:
       "Here are a few small IIO fixes for 4.8-rc6.
      
        Nothing major, full details are in the shortlog, all of these have
        been in linux-next with no reported issues"
      
      * tag 'staging-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
        iio:core: fix IIO_VAL_FRACTIONAL sign handling
        iio: ensure ret is initialized to zero before entering do loop
        iio: accel: kxsd9: Fix scaling bug
        iio: accel: bmc150: reset chip at init time
        iio: fix pressure data output unit in hid-sensor-attributes
        tools:iio:iio_generic_buffer: fix trigger-less mode
      2afe669a
    • L
      Merge tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · 61c3dae6
      Linus Torvalds 提交于
      Pull USB fixes from Greg KH:
       "Here are some small USB gadget, phy, and xhci fixes for 4.8-rc6.
      
        All of these resolve minor issues that have been reported, and all
        have been in linux-next with no reported issues"
      
      * tag 'usb-4.8-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
        xhci: fix null pointer dereference in stop command timeout function
        usb: dwc3: pci: fix build warning on !PM_SLEEP
        usb: gadget: prevent potenial null pointer dereference on skb->len
        usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
        usb: phy: phy-generic: Check clk_prepare_enable() error
        usb: gadget: udc: renesas-usb3: clear VBOUT bit in DRD_CON
        Revert "usb: dwc3: gadget: always decrement by 1"
      61c3dae6
  6. 11 9月, 2016 3 次提交
    • L
      Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm · 98ac9a60
      Linus Torvalds 提交于
      Pull libnvdimm fixes from Dan Williams:
       "nvdimm fixes for v4.8, two of them are tagged for -stable:
      
         - Fix devm_memremap_pages() to use track_pfn_insert().  Otherwise,
           DAX pmd mappings end up with an uncached pgprot, and unusable
           performance for the device-dax interface.  The device-dax interface
           appeared in 4.7 so this is tagged for -stable.
      
         - Fix a couple VM_BUG_ON() checks in the show_smaps() path to
           understand DAX pmd entries.  This fix is tagged for -stable.
      
         - Fix a mis-merge of the nfit machine-check handler to flip the
           polarity of an if() to match the final version of the patch that
           Vishal sent for 4.8-rc1.  Without this the nfit machine check
           handler never detects / inserts new 'badblocks' entries which
           applications use to identify lost portions of files.
      
         - For test purposes, fix the nvdimm_clear_poison() path to operate on
           legacy / simulated nvdimm memory ranges.  Without this fix a test
           can set badblocks, but never clear them on these ranges.
      
         - Fix the range checking done by dax_dev_pmd_fault().  This is not
           tagged for -stable since this problem is mitigated by specifying
           aligned resources at device-dax setup time.
      
        These patches have appeared in a next release over the past week.  The
        recent rebase you can see in the timestamps was to drop an invalid fix
        as identified by the updated device-dax unit tests [1].  The -mm
        touches have an ack from Andrew"
      
      [1]: "[ndctl PATCH 0/3] device-dax test for recent kernel bugs"
         https://lists.01.org/pipermail/linux-nvdimm/2016-September/006855.html
      
      * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
        libnvdimm: allow legacy (e820) pmem region to clear bad blocks
        nfit, mce: Fix SPA matching logic in MCE handler
        mm: fix cache mode of dax pmd mappings
        mm: fix show_smap() for zone_device-pmd ranges
        dax: fix mapping size check
      98ac9a60
    • L
      Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · b8db3714
      Linus Torvalds 提交于
      Pull i2c fixes from Wolfram Sang:
       "Mostly driver bugfixes, but also a few cleanups which are nice to have
        out of the way"
      
      * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: rk3x: Restore clock settings at resume time
        i2c: Spelling s/acknowedge/acknowledge/
        i2c: designware: save the preset value of DW_IC_SDA_HOLD
        Documentation: i2c: slave-interface: add note for driver development
        i2c: mux: demux-pinctrl: run properly with multiple instances
        i2c: bcm-kona: fix inconsistent indenting
        i2c: rcar: use proper device with dma_mapping_error
        i2c: sh_mobile: use proper device with dma_mapping_error
        i2c: mux: demux-pinctrl: invalidate properly when switching fails
      b8db3714
    • L
      Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 6905732c
      Linus Torvalds 提交于
      Pull fscrypto fixes fromTed Ts'o:
       "Fix some brown-paper-bag bugs for fscrypto, including one one which
        allows a malicious user to set an encryption policy on an empty
        directory which they do not own"
      
      * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        fscrypto: require write access to mount to set encryption policy
        fscrypto: only allow setting encryption policy on directories
        fscrypto: add authorization check for setting encryption policy
      6905732c
  7. 10 9月, 2016 18 次提交
  8. 09 9月, 2016 2 次提交
    • L
      Merge tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 2771fc8e
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Fixes marked for stable:
         - Don't alias user region to other regions below PAGE_OFFSET from
           Paul Mackerras
         - Fix again csum_partial_copy_generic() on 32-bit from Christophe
           Leroy
         - Fix corrupted PE allocation bitmap on releasing PE from Gavin Shan
      
        Fixes for code merged this cycle:
         - Fix crash on releasing compound PE from Gavin Shan
         - Fix processor numbers in OPAL ICP from Benjamin Herrenschmidt
         - Fix little endian build with CONFIG_KEXEC=n from Thiago Jung
           Bauermann"
      
      * tag 'powerpc-4.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
        powerpc/32: Fix again csum_partial_copy_generic()
        powerpc/powernv: Fix corrupted PE allocation bitmap on releasing PE
        powerpc/powernv: Fix crash on releasing compound PE
        powerpc/xics/opal: Fix processor numbers in OPAL ICP
        powerpc/pseries: Fix little endian build with CONFIG_KEXEC=n
      2771fc8e
    • L
      Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm · 53d5f1dc
      Linus Torvalds 提交于
      Pull ARM fixes from Russell King:
       "A few ARM fixes:
      
         - Robin Murphy noticed that the non-secure privileged entry was
           relying on undefined behaviour, which needed to be fixed.
      
         - Vladimir Murzin noticed that prov-v7 fails to build for MMUless
           configurations because a required header file wasn't included.
      
         - A bunch of fixes for StrongARM regressions found while testing
           4.8-rc on such platforms"
      
      * 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
        ARM: sa1100: clear reset status prior to reboot
        ARM: 8600/1: Enforce some NS-SVC initialisation
        ARM: 8599/1: mm: pull asm/memory.h explicitly
        ARM: sa1100: register clocks early
        ARM: sa1100: fix 3.6864MHz clock
      53d5f1dc