1. 19 7月, 2010 1 次提交
    • P
      [S390] dasd: use correct label location for diag fba disks · cffab6bc
      Peter Oberparleiter 提交于
      Partition boundary calculation fails for DASD FBA disks under the
      following conditions:
      - disk is formatted with CMS FORMAT with a blocksize of more than
        512 bytes
      - all of the disk is reserved to a single CMS file using CMS RESERVE
      - the disk is accessed using the DIAG mode of the DASD driver
      
      Under these circumstances, the partition detection code tries to
      read the CMS label block containing partition-relevant information
      from logical block offset 1, while it is in fact located at physical
      block offset 1.
      
      Fix this problem by using the correct CMS label block location
      depending on the device type as determined by the DASD SENSE ID
      information.
      Signed-off-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      cffab6bc
  2. 17 7月, 2010 1 次提交
  3. 16 7月, 2010 3 次提交
    • J
      jbd2/ocfs2: Fix block checksumming when a buffer is used in several transactions · 13ceef09
      Jan Kara 提交于
      OCFS2 uses t_commit trigger to compute and store checksum of the just
      committed blocks. When a buffer has b_frozen_data, checksum is computed
      for it instead of b_data but this can result in an old checksum being
      written to the filesystem in the following scenario:
      
      1) transaction1 is opened
      2) handle1 is opened
      3) journal_access(handle1, bh)
          - This sets jh->b_transaction to transaction1
      4) modify(bh)
      5) journal_dirty(handle1, bh)
      6) handle1 is closed
      7) start committing transaction1, opening transaction2
      8) handle2 is opened
      9) journal_access(handle2, bh)
          - This copies off b_frozen_data to make it safe for transaction1 to commit.
            jh->b_next_transaction is set to transaction2.
      10) jbd2_journal_write_metadata() checksums b_frozen_data
      11) the journal correctly writes b_frozen_data to the disk journal
      12) handle2 is closed
          - There was no dirty call for the bh on handle2, so it is never queued for
            any more journal operation
      13) Checkpointing finally happens, and it just spools the bh via normal buffer
      writeback.  This will write b_data, which was never triggered on and thus
      contains a wrong (old) checksum.
      
      This patch fixes the problem by calling the trigger at the moment data is
      frozen for journal commit - i.e., either when b_frozen_data is created by
      do_get_write_access or just before we write a buffer to the log if
      b_frozen_data does not exist. We also rename the trigger to t_frozen as
      that better describes when it is called.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      13ceef09
    • W
      ocfs2/dlm: Remove BUG_ON from migration in the rare case of a down node · a39953dd
      Wengang Wang 提交于
      For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set
      before sending DLM_MIG_LOCKRES_MSG message to the target. We are using
      dlm_migration_can_proceed() for that purpose.  However, if the node is
      down, dlm_migration_can_proceed() will also return "go ahead".  In this
      rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove
      the BUG_ON() that trips over this condition.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      a39953dd
    • T
      ocfs2: Don't duplicate pages past i_size during CoW. · f5e27b6d
      Tao Ma 提交于
      During CoW, the pages after i_size don't contain valid data, so there's
      no need to read and duplicate them.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      f5e27b6d
  4. 15 7月, 2010 5 次提交
    • B
      GFS2: rename causes kernel Oops · 728a756b
      Bob Peterson 提交于
      This patch fixes a kernel Oops in the GFS2 rename code.
      
      The problem was in the way the gfs2 directory code was trying
      to re-use sentinel directory entries.
      
      In the failing case, gfs2's rename function was renaming a
      file to another name that had the same non-trivial length.
      The file being renamed happened to be the first directory
      entry on the leaf block.
      
      First, the rename code (gfs2_rename in ops_inode.c) found the
      original directory entry and decided it could do its job by
      simply replacing the directory entry with another.  Therefore
      it determined correctly that no block allocations were needed.
      
      Next, the rename code deleted the old directory entry prior to
      replacing it with the new name.  Therefore, the soon-to-be
      replaced directory entry was temporarily made into a directory
      entry "sentinel" or a place holder at the start of a leaf block.
      
      Lastly, it went to re-add the replacement directory entry in
      that leaf block.  However, when gfs2_dirent_find_space was
      looking for space in the leaf block, it used the wrong value
      for the sentinel.  That threw off its calculations so later
      it decides it can't really re-use the sentinel and therefore
      must allocate a new leaf block.  But because it previously decided
      to re-use the directory entry, it didn't waste the time to
      grab a new block allocation for the inode.  Therefore, the
      inode's i_alloc pointer was still NULL and it crashes trying to
      reference it.
      
      In the case of sentinel directory entries, the entire dirent is
      reused, not just the "free space" portion of it, and therefore
      the function gfs2_dirent_find_space should use the value 0
      rather than GFS2_DIRENT_SIZE(0) for the actual dirent size.
      
      Fixing this calculation enables the reproducer programs to work
      properly.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      728a756b
    • A
      GFS2: BUG in gfs2_adjust_quota · 8b421601
      Abhijith Das 提交于
      HighMem pages on i686 do not get mapped to the buffer_heads and this was
      causing a NULL pointer dereference when we were trying to memset page buffers
      to zero.
      We now use zero_user() that kmaps the page and directly manipulates page data.
      This patch also fixes a boundary condition that was incorrect.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8b421601
    • B
      GFS2: Fix kernel NULL pointer dereference by dlm_astd · b1becbde
      Bob Peterson 提交于
      This patch fixes a problem in an error path when looking
      up dinodes.  There are two sister-functions, gfs2_inode_lookup
      and gfs2_process_unlinked_inode.  Both functions acquire and
      hold the i_iopen glock for the dinode being looked up. The last
      thing they try to do is hold the i_gl glock for the dinode.
      If that glock fails for some reason, the error path was
      incorrectly calling gfs2_glock_put for the i_iopen glock twice.
      This resulted in the glock being prematurely freed.  The
      "minimum hold time" usually kept the glock in memory, but the
      lock interface to dlm (aka lock_dlm) freed its memory for the
      glock.  In some circumstances, it would cause dlm's dlm_astd daemon
      to try to call the bast function for the freed lock_dlm memory,
      which resulted in a NULL pointer dereference.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1becbde
    • B
      GFS2: recovery stuck on transaction lock · b7dc2df5
      Bob Peterson 提交于
      This patch fixes bugzilla bug #590878: GFS2: recovery stuck on
      transaction lock.  We set the frozen flag on the glock when we receive
      a completion that cannot be delivered due to blocked locks. At that
      point we check to see whether the first waiting holder has the noexp
      flag set. If the noexp lock is queued later, then we need to unfreeze
      the glock at that point in time, namely, in the glock work function.
      
      This patch was originally written by Steve Whitehouse, but since
      he's on holiday, I'm submitting it.  It's been well tested with a
      complex recovery test called revolver.
      Signed-off-by: NSteve Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b7dc2df5
    • B
      GFS2: O_TRUNC not working on stuffed files across cluster · a8bf2bc2
      Bob Peterson 提交于
      This patch replaces a statement that got dropped out by accident.
      Without the patch, truncates on stuffed (very small) files cause
      those files to have an unpredictable size.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a8bf2bc2
  5. 13 7月, 2010 6 次提交
    • D
      ocfs2: tighten up strlen() checking · e372357b
      Dan Carpenter 提交于
      This function is only called from one place and it's like this:
      	dlm_register_domain(conn->cc_name, dlm_key, &fs_version);
      
      The "conn->cc_name" is 64 characters long.  If strlen(conn->cc_name)
      were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because
      strlen() doesn't count the NULL character.
      
      In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes
      64 character buffers.  The only exception is nd_name from struct
      o2nm_node.
      
      Anyway I looked into it and in this case the domain string comes from
      osb->uuid_str in ocfs2_setup_osb_uuid().  That's 32 characters and NULL
      which easily fits into O2NM_MAX_NAME_LEN.  This patch doesn't change how
      the code works, but I think it makes the code a little cleaner.
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e372357b
    • T
      ocfs2: Make xattr reflink work with new local alloc reservation. · 121a39bb
      Tao Ma 提交于
      The new reservation code in local alloc has add the limitation
      that the caller should handle the case that the local alloc
      doesn't give use enough contiguous clusters. It make the old
      xattr reflink code broken.
      
      So this patch udpate the xattr reflink code so that it can
      handle the case that local alloc give us one cluster at a time.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      121a39bb
    • T
      ocfs2: make xattr extension work with new local alloc reservation. · a78f9f46
      Tao Ma 提交于
      The old ocfs2_xattr_extent_allocation is too optimistic about
      the clusters we can get. So actually if the file system is
      too fragmented, ocfs2_add_clusters_in_btree will return us
      with EGAIN and we need to allocate clusters once again.
      
      So this patch change it to a while loop so that we can allocate
      clusters until we reach clusters_to_add.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      a78f9f46
    • T
      ocfs2: Remove the redundant cpu_to_le64. · 0a463b74
      Tao Ma 提交于
      In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno.
      But actually bg->bg_blkno is already changed to little endian
      in ocfs2_block_group_fill. So remove the extra cpu_to_le64.
      Reported-by: NMarcos Matsunaga <Marcos.Matsunaga@oracle.com>
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      0a463b74
    • W
      ocfs2/dlm: don't access beyond bitmap size · f471c9df
      Wengang Wang 提交于
      dlm->recovery_map is defined as
      	unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)];
      
      We should treat O2NM_MAX_NODES as the bit map size in bits.
      This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      f471c9df
    • J
      ocfs2: No need to zero pages past i_size. · 693c241a
      Joel Becker 提交于
      When ocfs2 fills a hole, it does so by allocating clusters.  When a
      cluster is larger than the write, ocfs2 must zero the portions of the
      cluster outside of the write.  If the clustersize is smaller than a
      pagecache page, this is handled by the normal pagecache mechanisms, but
      when the clustersize is larger than a page, ocfs2's write code will zero
      the pages adjacent to the write.  This makes sure the entire cluster is
      zeroed correctly.
      
      Currently ocfs2 behaves exactly the same when writing past i_size.
      However, this means ocfs2 is writing zeroed pages for portions of a new
      cluster that are beyond i_size.  The page writeback code isn't expecting
      this.  It treats all pages past the one containing i_size as left behind
      due to a previous truncate operation.
      
      Thankfully, ocfs2 calculates the number of pages it will be working on
      up front.  The rest of the write code merely honors the original
      calculation.  We can simply trim the number of pages to only cover the
      actual file data.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      693c241a
  6. 09 7月, 2010 2 次提交
    • J
      ocfs2: Zero the tail cluster when extending past i_size. · 5693486b
      Joel Becker 提交于
      ocfs2's allocation unit is the cluster.  This can be larger than a block
      or even a memory page.  This means that a file may have many blocks in
      its last extent that are beyond the block containing i_size.  There also
      may be more unwritten extents after that.
      
      When ocfs2 grows a file, it zeros the entire cluster in order to ensure
      future i_size growth will see cleared blocks.  Unfortunately,
      block_write_full_page() drops the pages past i_size.  This means that
      ocfs2 is actually leaking garbage data into the tail end of that last
      cluster.  This is a bug.
      
      We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect
      when a write or truncate is past i_size.  They will use
      ocfs2_zero_extend() to ensure the data is properly zeroed.
      
      Older versions of ocfs2_zero_extend() simply zeroed every block between
      i_size and the zeroing position.  This presumes three things:
      
      1) There is allocation for all of these blocks.
      2) The extents are not unwritten.
      3) The extents are not refcounted.
      
      (1) and (2) hold true for non-sparse filesystems, which used to be the
      only users of ocfs2_zero_extend().  (3) is another bug.
      
      Since we're now using ocfs2_zero_extend() for sparse filesystems as
      well, we teach ocfs2_zero_extend() to check every extent between
      i_size and the zeroing position.  If the extent is unwritten, it is
      ignored.  If it is refcounted, it is CoWed.  Then it is zeroed.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      5693486b
    • J
      ocfs2: When zero extending, do it by page. · a4bfb4cf
      Joel Becker 提交于
      ocfs2_zero_extend() does its zeroing block by block, but it calls a
      function named ocfs2_write_zero_page().  Let's have
      ocfs2_write_zero_page() handle the page level.  From
      ocfs2_zero_extend()'s perspective, it is now page-at-a-time.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      a4bfb4cf
  7. 06 7月, 2010 4 次提交
    • C
      writeback: simplify the write back thread queue · 83ba7b07
      Christoph Hellwig 提交于
      First remove items from work_list as soon as we start working on them.  This
      means we don't have to track any pending or visited state and can get
      rid of all the RCU magic freeing the work items - we can simply free
      them once the operation has finished.  Second use a real completion for
      tracking synchronous requests - if the caller sets the completion pointer
      we complete it, otherwise use it as a boolean indicator that we can free
      the work item directly.  Third unify struct wb_writeback_args and struct
      bdi_work into a single data structure, wb_writeback_work.  Previous we
      set all parameters into a struct wb_writeback_args, copied it into
      struct bdi_work, copied it again on the stack to use it there.  Instead
      of just allocate one structure dynamically or on the stack and use it
      all the way through the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      83ba7b07
    • C
      writeback: split writeback_inodes_wb · edadfb10
      Christoph Hellwig 提交于
      The case where we have a superblock doesn't require a loop here as we scan
      over all inodes in writeback_sb_inodes. Split it out into a separate helper
      to make the code simpler.  This also allows to get rid of the sb member in
      struct writeback_control, which was rather out of place there.
      
      Also update the comments in writeback_sb_inodes that explain the handling
      of inodes from wrong superblocks.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      edadfb10
    • C
      writeback: remove writeback_inodes_wbc · 9c3a8ee8
      Christoph Hellwig 提交于
      This was just an odd wrapper around writeback_inodes_wb.  Removing this
      also allows to get rid of the bdi member of struct writeback_control
      which was rather out of place there.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      9c3a8ee8
    • S
      ceph: fix crush device 'out' threshold to 1.0, not 0.1 · 153a1093
      Sage Weil 提交于
      Fix a typo that made any OSD weighted between 0.1 and 1.0 effectively
      weighted as 1.0 (fully in).
      Signed-off-by: NSage Weil <sage@newdream.net>
      153a1093
  8. 01 7月, 2010 1 次提交
  9. 30 6月, 2010 9 次提交
  10. 28 6月, 2010 1 次提交
  11. 25 6月, 2010 5 次提交
  12. 24 6月, 2010 2 次提交