1. 13 12月, 2015 1 次提交
    • H
      osd fs: __r4w_get_page rely on PageUptodate for uptodate · 3066a967
      Hugh Dickins 提交于
      Commit 42cb14b1 ("mm: migrate dirty page without
      clear_page_dirty_for_io etc") simplified the migration of a PageDirty
      pagecache page: one stat needs moving from zone to zone and that's about
      all.
      
      It's convenient and safest for it to shift the PageDirty bit from old
      page to new, just before updating the zone stats: before copying data
      and marking the new PageUptodate.  This is all done while both pages are
      isolated and locked, just as before; and just as before, there's a
      moment when the new page is visible in the radix_tree, but not yet
      PageUptodate.  What's new is that it may now be briefly visible as
      PageDirty before it is PageUptodate.
      
      When I scoured the tree to see if this could cause a problem anywhere,
      the only places I found were in two similar functions __r4w_get_page():
      which look up a page with find_get_page() (not using page lock), then
      claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
      
      I'm not sure whether that was right before, but now it might be wrong
      (on rare occasions): only claim the page is uptodate if PageUptodate.
      Or perhaps the page in question could never be migratable anyway?
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Tested-by: NBoaz Harrosh <ooo@electrozaur.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3066a967
  2. 29 9月, 2015 1 次提交
  3. 28 3月, 2015 2 次提交
  4. 04 2月, 2015 3 次提交
  5. 20 10月, 2014 1 次提交
  6. 13 9月, 2014 1 次提交
  7. 11 9月, 2014 1 次提交
  8. 25 6月, 2014 1 次提交
  9. 29 5月, 2014 3 次提交
  10. 12 4月, 2013 1 次提交
  11. 18 2月, 2013 1 次提交
    • F
      umount oops when remove blocklayoutdriver first · 5a12cca6
      fanchaoting 提交于
      now pnfs client uses block layout, maybe we can remove
      blocklayoutdriver first. if we umount later,
      it can cause oops in unset_pnfs_layoutdriver.
      because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.
      
      reproduce it:
       modprobe  blocklayoutdriver
       mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
       rmmod blocklayoutdriver
       umount /mnt
      
      then you can see following
      
      CPU 0
      Pid: 17023, comm: umount.nfs4 Tainted: GF          O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffffa04cfe6d>]  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
      RSP: 0018:ffff8800022d9e48  EFLAGS: 00010286
      RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
      RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
      RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
      R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
      FS:  00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
      Stack:
      ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
      ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
      ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
      Call Trace:
      [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4]
      [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs]
      [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs]
      [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70
      [<ffffffff8117986a>] deactivate_super+0x4a/0x70
      [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130
      [<ffffffff81194d62>] sys_umount+0x72/0xe0
      [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b
      Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
      RIP  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
      RSP <ffff8800022d9e48>
      CR2: ffffffffa04a1b38
      ---[ end trace 29f75aaedda058bf ]---
      
      Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      5a12cca6
  12. 17 10月, 2012 1 次提交
  13. 09 10月, 2012 1 次提交
  14. 03 8月, 2012 1 次提交
    • B
      pnfs-obj: Better IO pattern in case of unaligned offset · 7de6e284
      Boaz Harrosh 提交于
      Depending on layout and ARCH, ORE has some limits on max IO sizes
      which is communicated on (what else) ore_layout->max_io_length,
      which is always stripe aligned.
      This was considered as the pg_test boundary for splitting and starting
      a new IO.
      
      But in the case of a long IO where the start offset is not aligned
      what would happen is that both end of IO[N] and start of IO[N+1]
      would be unaligned, causing each IO boundary parity unit to be
      calculated and written twice.
      
      So what we do in this patch is split the very start of an unaligned
      IO, up to a stripe boundary, and then next IO's can continue fully
      aligned til the end.
      
      We might be sacrificing the case where the full unaligned IO would
      fit within a single max_io_length, but the sacrifice is well worth
      the elimination of double calculation and parity units IO.
      Actually the sacrificing is marginal and is almost unmeasurable.
      
      TODO:
      	If we know the total expected linear segment that will
      	be received, at pg_init, we could use that information
      	in many places:
      	1. blocks-layout get_layout write segment size
      	2. Better mds-threshold
      	3. In above situation for a better clean split
      
      	I will do this in future submission.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      7de6e284
  15. 20 7月, 2012 2 次提交
    • B
      pnfs-obj: Fix __r4w_get_page when offset is beyond i_size · c999ff68
      Boaz Harrosh 提交于
      It is very common for the end of the file to be unaligned on
      stripe size. But since we know it's beyond file's end then
      the XOR should be preformed with all zeros.
      
      Old code used to just read zeros out of the OSD devices, which is a great
      waist. But what scares me more about this situation is that, we now have
      pages attached to the file's mapping that are beyond i_size. I don't
      like the kind of bugs this calls for.
      
      Fix both birds, by returning a global zero_page, if offset is beyond
      i_size.
      
      TODO:
      	Change the API to ->__r4w_get_page() so a NULL can be
      	returned without being considered as error, since XOR API
      	treats NULL entries as zero_pages.
      
      [Bug since 3.2. Should apply the same way to all Kernels since]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      c999ff68
    • B
      pnfs-obj: don't leak objio_state if ore_write/read fails · 9909d45a
      Boaz Harrosh 提交于
      [Bug since 3.2 Kernel]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      9909d45a
  16. 05 5月, 2012 1 次提交
    • T
      NFS: Fix sparse warnings · 1385b811
      Trond Myklebust 提交于
      Fix the following sparse warnings:
      
      fs/nfs/direct.c:221:6: warning: symbol 'nfs_direct_readpage_release' was
      not declared. Should it be static?
      fs/nfs/read.c:38:43: warning: non-ANSI function declaration of function
      'nfs_readhdr_alloc'
      fs/nfs/objlayout/objio_osd.c:214:5: warning: symbol '__alloc_objio_seg'
      was not declared. Should it be static?
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Fred Isaman <iisaman@netapp.com>
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      1385b811
  17. 28 4月, 2012 1 次提交
    • F
      NFS: create common nfs_pgio_header for both read and write · cd841605
      Fred Isaman 提交于
      In order to avoid duplicating all the data in nfs_read_data whenever we
      split it up into multiple RPC calls (either due to a short read result
      or due to rsize < PAGE_SIZE), we split out the bits that are the same
      per RPC call into a separate "header" structure.
      
      The goal this patch moves towards is to have a single header
      refcounted by several rpc_data structures.  Thus, want to always refer
      from rpc_data to the header, and not the other way.  This patch comes
      close to that ideal, but the directio code currently needs some
      special casing, isolated in the nfs_direct_[read_write]hdr_release()
      functions.  This will be dealt with in a future patch.
      Signed-off-by: NFred Isaman <iisaman@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      cd841605
  18. 21 3月, 2012 1 次提交
    • S
      pnfs-obj: autologin: Add support for protocol autologin · 18d98f6c
      Sachin Bhamare 提交于
      The pnfs-objects protocol mandates that we autologin into devices not
      present in the system, according to information specified in the
      get_device_info returned from the server.
      
      The Protocol specifies two login hints.
      1. An IP address:port combination
      2. A string URI which is constructed as a URL with a protocol prefix
         followed by :// and a string as address. For each  protocol prefix
         the string-address format might be different.
      
      We only support the second option. The first option is just redundant
      to the second one.
      NOTE: The Kernel part of autologin does not parse the URI string. It
      just channels it to a user-mode script. So any new login protocols should
      only update the user-mode script which is a part of the nfs-utils package,
      but the Kernel need not change.
      
      We implement the autologin by using the call_usermodehelper() API.
      (Thanks to Steve Dickson <steved@redhat.com> for pointing it out)
      So there is no running daemon needed, and/or special setup.
      
      We Add the osd_login_prog Kernel module parameters which defaults to:
      	/sbin/osd_login
      
      Kernel try's to upcall the program specified in osd_login_prog. If the file is
      not found or the execution fails Kernel will disable any farther upcalls, by
      zeroing out  osd_login_prog, Until Admin re-enables it by setting the
      osd_login_prog parameter to a proper program.
      
      Also add text about the osd_login program command line API to:
      	Documentation/filesystems/nfs/pnfs.txt
      and documentation of the new  osd_login_prog  module parameter to:
      	Documentation/kernel-parameters.txt
      
      TODO: Add timeout option in the case osd_login program gets
                    stuck
      Signed-off-by: NSachin Bhamare <sbhamare@panasas.com>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      18d98f6c
  19. 14 3月, 2012 1 次提交
    • B
      pnfs-obj: Uglify objio_segment allocation for the sake of the principle :-( · 5318a29c
      Boaz Harrosh 提交于
      At some past instance Linus Trovalds wrote:
      > From: Linus Torvalds <torvalds@linux-foundation.org>
      > commit a84a79e4 upstream.
      >
      > The size is always valid, but variable-length arrays generate worse code
      > for no good reason (unless the function happens to be inlined and the
      > compiler sees the length for the simple constant it is).
      >
      > Also, there seems to be some code generation problem on POWER, where
      > Henrik Bakken reports that register r28 can get corrupted under some
      > subtle circumstances (interrupt happening at the wrong time?).  That all
      > indicates some seriously broken compiler issues, but since variable
      > length arrays are bad regardless, there's little point in trying to
      > chase it down.
      >
      > "Just don't do that, then".
      
      Since then any use of "variable length arrays" has become blasphemous.
      Even in perfectly good, beautiful, perfectly safe code like the one
      below where the variable length arrays are only used as a sizeof()
      parameter, for type-safe dynamic structure allocations. GCC is not
      executing any stack allocation code.
      
      I have produced a small file which defines two functions main1(unsigned numdevs)
      and main2(unsigned numdevs). main1 uses code as before with call to malloc
      and main2 uses code as of after this patch. I compiled it as:
      	gcc -O2 -S see_asm.c
      and here is what I get:
      
      <see_asm.s>
      main1:
      .LFB7:
      	.cfi_startproc
      	mov	%edi, %edi
      	leaq	4(%rdi,%rdi), %rdi
      	salq	$3, %rdi
      	jmp	malloc
      	.cfi_endproc
      .LFE7:
      	.size	main1, .-main1
      	.p2align 4,,15
      	.globl	main2
      	.type	main2, @function
      main2:
      .LFB8:
      	.cfi_startproc
      	mov	%edi, %edi
      	addq	$2, %rdi
      	salq	$4, %rdi
      	jmp	malloc
      	.cfi_endproc
      .LFE8:
      	.size	main2, .-main2
      	.section	.text.startup,"ax",@progbits
      	.p2align 4,,15
      </see_asm.s>
      
      *Exact* same code !!!
      
      So please seriously consider not accepting this patch and leave the
      perfectly good code intact.
      
      CC: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      5318a29c
  20. 07 2月, 2012 1 次提交
  21. 06 1月, 2012 1 次提交
    • B
      pnfs-obj: Must return layout on IO error · fe0fe835
      Boaz Harrosh 提交于
      As mandated by the standard. In case of an IO error, a pNFS
      objects layout driver must return it's layout. This is because
      all device errors are reported to the server as part of the
      layout return buffer.
      
      This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
      is done, through a bit flag on the pnfs_layoutdriver_type->flags
      member. The flag is set by the layout driver that wants a
      layout_return preformed at pnfs_ld_{write,read}_done in case
      of an error.
      (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
       because this code is never called outside of pnfs.c and pnfs IO
       paths)
      
      Without this patch 3.[0-2] Kernels leak memory and have an annoying
      WARN_ON after every IO error utilizing the pnfs-obj driver.
      
      [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      fe0fe835
  22. 03 11月, 2011 7 次提交
  23. 04 8月, 2011 2 次提交
    • B
      pnfs-obj: Fix the comp_index != 0 case · 9af7db32
      Boaz Harrosh 提交于
      There were bugs in the case of partial layout where olo_comp_index
      is not zero. This used to work and was tested but one of the later
      cleanup SQUASHMEs broke it and was not tested since.
      
      Also add a dprint that specify those received layout parameters.
      Everything else was already printed.
      
      [Needed in v3.0]
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9af7db32
    • B
      pnfs-obj: Bug when we are running out of bio · 20618b21
      Boaz Harrosh 提交于
      When we have a situation that the number of pages we want
      to encode is bigger then the size of the bio. (Which can
      currently happen only when all IO is going to a single device
      .e.g group_width==1) then the IO is submitted short and we
      report back only the amount of bytes we actually wrote/read
      and all is fine. BUT ...
      
      There was a bug that the current length counter was advanced
      before the fail to add the extra page, and we come to a situation
      that the CDB length was one-page longer then the actual bio size,
      which is of course rejected by the osd-target.
      
      While here also fix the bio size calculation, in the case
      that we received more then one group of devices.
      
      CC: Stable Tree <stable@kernel.org>
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      20618b21
  24. 16 7月, 2011 1 次提交
  25. 15 7月, 2011 2 次提交
  26. 13 7月, 2011 1 次提交