1. 14 7月, 2012 4 次提交
  2. 02 7月, 2012 9 次提交
    • D
      xfs: factor buffer reading from xfs_dir2_leaf_getdents · 9b73bd7b
      Dave Chinner 提交于
      The buffer reading code in xfs_dir2_leaf_getdents is complex and difficult to
      follow due to the readahead and all the context is carries. it is also badly
      indented and so difficult to read. Factor it out into a separate function to
      make it easier to understand and optimise in future patches.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9b73bd7b
    • D
      xfs: remove struct xfs_dabuf and infrastructure · 1d9025e5
      Dave Chinner 提交于
      The struct xfs_dabuf now only tracks a single xfs_buf and all the
      information it holds can be gained directly from the xfs_buf. Hence
      we can remove the struct dabuf and pass the xfs_buf around
      everywhere.
      
      Kill the struct dabuf and the associated infrastructure.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1d9025e5
    • D
      xfs: use discontiguous xfs_buf support in dabuf wrappers · 3605431f
      Dave Chinner 提交于
      First step in converting the directory code to use native
      discontiguous buffers and replacing the dabuf construct.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3605431f
    • D
      xfs: support discontiguous buffers in the xfs_buf_log_item · 372cc85e
      Dave Chinner 提交于
      discontigous buffer in separate buffer format structures. This means log
      recovery will recover all the changes on a per segment basis without
      requiring any knowledge of the fact that it was logged from a
      compound buffer.
      
      To do this, we need to be able to determine what buffer segment any
      given offset into the compound buffer sits over. This enables us to
      translate the dirty bitmap in the number of separate buffer format
      structures required.
      
      We also need to be able to determine the number of bitmap elements
      that a given buffer segment has, as this determines the size of the
      buffer format structure. Hence we need to be able to determine the
      both the start offset into the buffer and the length of a given
      segment to be able to calculate this.
      
      With this information, we can preallocate, build and format the
      correct log vector array for each segment in a compound buffer to
      appear exactly the same as individually logged buffers in the log.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      372cc85e
    • D
      xfs: add discontiguous buffer support to transactions · de2a4f59
      Dave Chinner 提交于
      Now that the buffer cache supports discontiguous buffers, add
      support to the transaction buffer interface for getting and reading
      buffers.
      
      Note that this patch does not convert the buffer item logging to
      support discontiguous buffers. That will be done as a separate
      commit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      de2a4f59
    • D
      xfs: add discontiguous buffer map interface · 6dde2707
      Dave Chinner 提交于
      With the internal interfaces supporting discontiguous buffer maps,
      add external lookup, read and get interfaces so they can start to be
      used.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6dde2707
    • D
      xfs: convert internal buffer functions to pass maps · 3e85c868
      Dave Chinner 提交于
      While the external interface currently uses separate blockno/length
      variables, we need to move internal interfaces to passing and
      parsing vector maps. This will then allow us to add external
      interfaces to support discontiguous buffer maps as the internal code
      will already support them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3e85c868
    • D
      xfs: separate buffer indexing from block map · cbb7baab
      Dave Chinner 提交于
      To support discontiguous buffers in the buffer cache, we need to
      separate the cache index variables from the I/O map. While this is
      currently a 1:1 mapping, discontiguous buffer support will break
      this relationship.
      
      However, for caching purposes, we can still treat them the same as a
      contiguous buffer - the block number of the first block and the
      length of the buffer - as that is still a unique representation.
      Also, the only way we will ever access the discontiguous regions of
      buffers is via bulding the complete buffer in the first place, so
      using the initial block number and entire buffer length is a sane
      way to index the buffers.
      
      Add a block mapping vector construct to the xfs_buf and use it in
      the places where we are doing IO instead of the current
      b_bn/b_length variables.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      cbb7baab
    • D
      xfs: struct xfs_buf_log_format isn't variable sized. · 77c1a08f
      Dave Chinner 提交于
      The struct xfs_buf_log_format wants to think the dirty bitmap is
      variable sized.  In fact, it is variable size on disk simply due to
      the way we map it from the in-memory structure, but we still just
      use a fixed size memory allocation for the in-memory structure.
      
      Hence it makes no sense to set the function up as a variable sized
      structure when we already know it's maximum size, and we always
      allocate it as such. Simplify the structure by making the dirty
      bitmap a fixed sized array and just using the size of the structure
      for the allocation size.
      
      This will make it much simpler to allocate and manipulate an array
      of format structures for discontiguous buffer support.
      
      The previous struct xfs_buf_log_item size according to
      /proc/slabinfo was 224 bytes. pahole doesn't give the same size
      because of the variable size definition. With this modification,
      pahole reports the same as /proc/slabinfo:
      
      	/* size: 224, cachelines: 4, members: 6 */
      
      Because the xfs_buf_log_item size is now determined by the maximum
      supported block size we introduce a dependency on xfs_alloc_btree.h.
      Avoid this dependency by moving the idefines for the maximum block
      sizes supported to xfs_types.h with all the other max/min type
      defines to avoid any new dependencies.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      77c1a08f
  3. 22 6月, 2012 5 次提交
  4. 21 6月, 2012 1 次提交
  5. 15 6月, 2012 6 次提交
    • C
      xfs: fix typo in comment of xfs_dinode_t. · 51c84223
      Chen Baozi 提交于
      There should be "XFS_DFORK_DPTR, XFS_DFORK_APTR, and XFS_DFORK_PTR" instead
      of "XFS_DFORK_PTR, XFS_DFORK_DPTR, and XFS_DFORK_PTR".
      Signed-off-by: NChen Baozi <baozich@gmail.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      51c84223
    • D
      xfs: kill copy and paste segment checks in xfs_file_aio_read · 52764329
      Dave Chinner 提交于
      The generic segment check code now returns a count of the number of
      bytes in the iovec, so we don't need to roll our own anymore.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      52764329
    • D
      xfs: make largest supported offset less shouty · 32972383
      Dave Chinner 提交于
      XFS_MAXIOFFSET() is just a simple macro that resolves to
      mp->m_maxioffset. It doesn't need to exist, and it just makes the
      code unnecessarily loud and shouty.
      
      Make it quiet and easy to read.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      32972383
    • D
      xfs: m_maxioffset is redundant · d2c28191
      Dave Chinner 提交于
      The m_maxioffset field in the struct xfs_mount contains the same
      value as the superblock s_maxbytes field. There is no need to carry
      two copies of this limit around, so use the VFS superblock version.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      d2c28191
    • J
      xfs: fix debug_object WARN at xfs_alloc_vextent() · 0f2cf9d3
      Jeff Liu 提交于
      Fengguang reports:
      
      [  780.529603] XFS (vdd): Ending clean mount
      [  781.454590] ODEBUG: object is on stack, but not annotated
      [  781.455433] ------------[ cut here ]------------
      [  781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1()
      [  781.455433] Hardware name: Bochs
      [  781.455433] Modules linked in:
      [  781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51
      [  781.455433] Call Trace:
      [  781.455433]  [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b
      [  781.455433]  [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c
      [  781.455433]  [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1
      [  781.455433]  [<ffffffff81491c65>] debug_object_init+0x14/0x16
      [  781.455433]  [<ffffffff8108842a>] __init_work+0x20/0x22
      [  781.455433]  [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5
      
      Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK.
      Reported-by: NWu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0f2cf9d3
    • A
      xfs: xfs_vm_writepage clear iomap_valid when !buffer_uptodate (REV2) · 7d0fa3ec
      Alain Renaud 提交于
      On filesytems with a block size smaller than PAGE_SIZE we currently have
      a problem with unwritten extents.  If a we have multi-block page for
      which an unwritten extent has been allocated, and only some of the
      buffers have been written to, and they are not contiguous, we can expose
      stale data from disk in the blocks between the writes after extent
      conversion.
      
      Example of a page with unwritten and real data.
      buffer  content
      0       empty  b_state = 0
      1       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      2       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      3       empty  b_state = 0
      4       empty  b_state = 0
      5       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      6       DATA   b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten
      7       empty  b_state = 0
      
      Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7
      empty.  Currently buffers 1, 2, 5, and 6 are added to a single ioend,
      and when IO has completed, extent conversion creates a real extent from
      block 1 through block 6, leaving 0 and 7 unwritten.  However buffers 3
      and 4 were not written to disk, so stale data is exposed from those
      blocks on a subsequent read.
      
      Fix this by setting iomap_valid = 0 when we find a buffer that is not
      Uptodate.  This ensures that buffers 5 and 6 are not added to the same
      ioend as buffers 1 and 2.  Later these blocks will be converted into two
      separate real extents, leaving the blocks in between unwritten.
      Signed-off-by: NAlain Renaud <arenaud@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7d0fa3ec
  6. 03 6月, 2012 13 次提交
    • L
      Linux 3.5-rc1 · f8f5701b
      Linus Torvalds 提交于
      f8f5701b
    • L
      Merge tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · 912afc36
      Linus Torvalds 提交于
      Pull device-mapper updates from Alasdair G Kergon:
       "Improve multipath's retrying mechanism in some defined circumstances
        and provide a simple reserve/release mechanism for userspace tools to
        access thin provisioning metadata while the pool is in use."
      
      * tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
        dm thin: provide userspace access to pool metadata
        dm thin: use slab mempools
        dm mpath: allow ioctls to trigger pg init
        dm mpath: delay retry of bypassed pg
        dm mpath: reduce size of struct multipath
      912afc36
    • J
      dm thin: provide userspace access to pool metadata · cc8394d8
      Joe Thornber 提交于
      This patch implements two new messages that can be sent to the thin
      pool target allowing it to take a snapshot of the _metadata_.  This,
      read-only snapshot can be accessed by userland, concurrently with the
      live target.
      
      Only one metadata snapshot can be held at a time.  The pool's status
      line will give the block location for the current msnap.
      
      Since version 0.1.5 of the userland thin provisioning tools, the
      thin_dump program displays the msnap as follows:
      
          thin_dump -m <msnap root> <metadata dev>
      
      Available here: https://github.com/jthornber/thin-provisioning-tools
      
      Now that userland can access the metadata we can do various things
      that have traditionally been kernel side tasks:
      
           i) Incremental backups.
      
           By using metadata snapshots we can work out what blocks have
           changed over time.  Combined with data snapshots we can ensure
           the data doesn't change while we back it up.
      
           A short proof of concept script can be found here:
      
           https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
      
           ii) Migration of thin devices from one pool to another.
      
           iii) Merging snapshots back into an external origin.
      
           iv) Asyncronous replication.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cc8394d8
    • M
      dm thin: use slab mempools · a24c2569
      Mike Snitzer 提交于
      Use dedicated caches prefixed with a "dm_" name rather than relying on
      kmalloc mempools backed by generic slab caches so the memory usage of
      thin provisioning (and any leaks) can be accounted for independently.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a24c2569
    • M
      dm mpath: allow ioctls to trigger pg init · 35991652
      Mikulas Patocka 提交于
      After the failure of a group of paths, any alternative paths that
      need initialising do not become available until further I/O is sent to
      the device.  Until this has happened, ioctls return -EAGAIN.
      
      With this patch, new paths are made available in response to an ioctl
      too.  The processing of the ioctl gets delayed until this has happened.
      
      Instead of returning an error, we submit a work item to kmultipathd
      (that will potentially activate the new path) and retry in ten
      milliseconds.
      
      Note that the patch doesn't retry an ioctl if the ioctl itself fails due
      to a path failure.  Such retries should be handled intelligently by the
      code that generated the ioctl in the first place, noting that some SCSI
      commands should not be retried because they are not idempotent (XOR write
      commands).  For commands that could be retried, there is a danger that
      if the device rejected the SCSI command, the path could be errorneously
      marked as failed, and the request would be retried on another path which
      might fail too.  It can be determined if the failure happens on the
      device or on the SCSI controller, but there is no guarantee that all
      SCSI drivers set these flags correctly.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      35991652
    • M
      dm mpath: delay retry of bypassed pg · f220fd4e
      Mike Christie 提交于
      If I/O needs retrying and only bypassed priority groups are available,
      set the pg_init_delay_retry flag to wait before retrying.
      
      If, for example, the reason for the bypass is that the controller is
      getting reset or there is a firmware upgrade happening, retrying right
      away would cause a flood of log messages and retries for what could be a
      few seconds or even several minutes.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f220fd4e
    • M
      dm mpath: reduce size of struct multipath · 1fbdd2b3
      Mike Snitzer 提交于
      Move multipath structure's 'lock' and 'queue_size' members to eliminate
      two 4-byte holes.  Also use a bit within a single unsigned int for each
      existing flag (saves 8-bytes).  This allows future flags to be added
      without each consuming an unsigned int.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1fbdd2b3
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4fc3acf2
      Linus Torvalds 提交于
      Pull networking updates from David Miller:
      
       1) Make syn floods consume significantly less resources by
      
          a) Not pre-COW'ing routing metrics for SYN/ACKs
          b) Mirroring the device queue mapping of the SYN for the SYN/ACK
             reply.
      
          Both from Eric Dumazet.
      
       2) Fix calculation errors in Byte Queue Limiting, from Hiroaki SHIMODA.
      
       3) Validate the length requested when building a paged SKB for a
          socket, so we don't overrun the page vector accidently.  From Jason
          Wang.
      
       4) When netlabel is disabled, we abort all IP option processing when we
          see a CIPSO option.  This isn't the right thing to do, we should
          simply skip over it and continue processing the remaining options
          (if any).  Fix from Paul Moore.
      
       5) SRIOV fixes for the mellanox driver from Jack orgenstein and Marcel
          Apfelbaum.
      
       6) 8139cp enables the receiver before the ring address is properly
          programmed, which potentially lets the device crap over random
          memory.  Fix from Jason Wang.
      
       7) e1000/e1000e fixes for i217 RST handling, and an improper buffer
          address reference in jumbo RX frame processing from Bruce Allan and
          Sebastian Andrzej Siewior, respectively.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        fec_mpc52xx: fix timestamp filtering
        mcs7830: Implement link state detection
        e1000e: fix Rapid Start Technology support for i217
        e1000: look into the page instead of skb->data for e1000_tbi_adjust_stats()
        r8169: call netif_napi_del at errpaths and at driver unload
        tcp: reflect SYN queue_mapping into SYNACK packets
        tcp: do not create inetpeer on SYNACK message
        8139cp/8139too: terminate the eeprom access with the right opmode
        8139cp: set ring address before enabling receiver
        cipso: handle CIPSO options correctly when NetLabel is disabled
        net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
        bql: Avoid possible inconsistent calculation.
        bql: Avoid unneeded limit decrement.
        bql: Fix POSDIFF() to integer overflow aware.
        net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
        net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
        net/mlx4_core: Fixes for VF / Guest startup flow
        net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
        net/mlx4_core: Fix number of EQs used in ICM initialisation
        net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
      4fc3acf2
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 63004afa
      Linus Torvalds 提交于
      Pull straggler x86 fixes from Peter Anvin:
       "Three groups of patches:
      
        - EFI boot stub documentation and the ability to print error messages;
        - Removal for PTRACE_ARCH_PRCTL for x32 (obsolete interface which
          should never have been ported, and the port is broken and
          potentially dangerous.)
        - ftrace stack corruption fixes.  I'm not super-happy about the
          technical implementation, but it is probably the least invasive in
          the short term.  In the future I would like a single method for
          nesting the debug stack, however."
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32
        x86, efi: Add EFI boot stub documentation
        x86, efi; Add EFI boot stub console support
        x86, efi: Only close open files in error path
        ftrace/x86: Do not change stacks in DEBUG when calling lockdep
        x86: Allow nesting of the debug stack IDT setting
        x86: Reset the debug_stack update counter
        ftrace: Use breakpoint method to update ftrace caller
        ftrace: Synchronize variable setting with breakpoints
      63004afa
    • L
      tty: Revert the tty locking series, it needs more work · f309532b
      Linus Torvalds 提交于
      This reverts the tty layer change to use per-tty locking, because it's
      not correct yet, and fixing it will require some more deep surgery.
      
      The main revert is d29f3ef3 ("tty_lock: Localise the lock"), but
      there are several smaller commits that built upon it, they also get
      reverted here. The list of reverted commits is:
      
        fde86d31 - tty: add lockdep annotations
        8f6576ad - tty: fix ldisc lock inversion trace
        d3ca8b64 - pty: Fix lock inversion
        b1d679af - tty: drop the pty lock during hangup
        abcefe5f - tty/amiserial: Add missing argument for tty_unlock()
        fd11b42e - cris: fix missing tty arg in wait_event_interruptible_tty call
        d29f3ef3 - tty_lock: Localise the lock
      
      The revert had a trivial conflict in the 68360serial.c staging driver
      that got removed in the meantime.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f309532b
    • S
      fec_mpc52xx: fix timestamp filtering · 9ca3cc6f
      Stephan Gatzka 提交于
      skb_defer_rx_timestamp was called with a freshly allocated skb but must
      be called with rskb instead.
      Signed-off-by: NStephan Gatzka <stephan@gatzka.org>
      Cc: stable <stable@vger.kernel.org>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ca3cc6f
    • O
      mcs7830: Implement link state detection · b1ff4f96
      Ondrej Zary 提交于
      Add .status callback that detects link state changes.
      Tested with MCS7832CV-AA chip (9710:7830, identified as rev.C by the driver).
      Fixes https://bugzilla.kernel.org/show_bug.cgi?id=28532Signed-off-by: NOndrej Zary <linux@rainbow-software.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1ff4f96
    • L
      Merge 'for-linus' branches from git://git.kernel.org/pub/scm/linux/kernel/git/viro/{vfs,signal} · 233e562e
      Linus Torvalds 提交于
      Pull vfs fix and a fix from the signal changes for frv from Al Viro.
      
      The __kernel_nlink_t for powerpc got scrogged because 64-bit powerpc
      actually depended on the default "unsigned long", while 32-bit powerpc
      had an explicit override to "unsigned short".  Al didn't notice, and
      made both of them be the unsigned short.
      
      The frv signal fix is fallout from simplifying the do_notify_resume()
      code, and leaving an extra parenthesis.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        powerpc: Fix size of st_nlink on 64bit
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
        frv: Remove bogus closing parenthesis
      233e562e
  7. 02 6月, 2012 2 次提交