1. 28 2月, 2010 11 次提交
    • B
      exofs: groups support · 50a76fd3
      Boaz Harrosh 提交于
      * _calc_stripe_info() changes to accommodate for grouping
        calculations. Returns additional information
      
      * old _prepare_pages() becomes _prepare_one_group()
        which stores pages belonging to one device group.
      
      * New _prepare_for_striping iterates on all groups calling
        _prepare_one_group().
      
      * Enable mounting of groups data_maps (group_width != 0)
      
      [QUESTION]
      what is faster A or B;
      A.	x += stride;
      	x = x % width + first_x;
      
      B	x += stride
      	if (x < last_x)
      		x = first_x;
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      50a76fd3
    • B
      exofs: Prepare for groups · b367e78b
      Boaz Harrosh 提交于
      * Rename _offset_dev_unit_off() to _calc_stripe_info()
        and recieve a struct for the output params
      
      * In _prepare_for_striping we only need to call
        _calc_stripe_info() once. The other componets
        are easy to calculate from that. This code
        was inspired by what's done in truncate.
      
      * Some code shifts that make sense now but will make
        more sense when group support is added.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      b367e78b
    • B
      exofs: Error recovery if object is missing from storage · 96391e2b
      Boaz Harrosh 提交于
      If an object is referenced by a directory but does not
      exist on a target, it is a very serious corruption that
      means:
      1. Either a power failure with very slim chance of it
        happening. Because the directory update is always submitted
        much after object creation, but if a directory is written
        to one device and the object creation to another it might
        theoretically happen.
      2. It only ever happened to me while developing with BUGs
        causing file corruption. Crashes could also cause it but
        they are more like case 1.
      
      In any way the object does not exist, so data is surely lost.
      If there is a mix-up in the obj-id or data-map, then lost objects
      can be salvaged by off-line fsck. The only recoverable information
      is the directory name. By letting it appear as a regular empty file,
      with date==0 (1970 Jan 1st) ownership to root, we enable recovery
      of the only useful information. And also enable deletion or over-write.
      I can see how this can hurt.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      96391e2b
    • B
      exofs: convert io_state to use pages array instead of bio at input · 86093aaf
      Boaz Harrosh 提交于
      * inode.c operations are full-pages based, and not actually
        true scatter-gather
      * Lets us use more pages at once upto 512 (from 249) in 64 bit
      * Brings us much much closer to be able to use exofs's io_state engine
        from objlayout driver. (Once I decide where to put the common code)
      
      After RAID0 patch the outer (input) bio was never used as a bio, but
      was simply a page carrier into the raid engine. Even in the simple
      mirror/single-dev arrangement pages info was copied into a second bio.
      It is now easer to just pass a pages array into the io_state and prepare
      bio(s) once.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      86093aaf
    • B
      exofs: RAID0 support · 5d952b83
      Boaz Harrosh 提交于
      We now support striping over mirror devices. Including variable sized
      stripe_unit.
      
      Some limits:
      * stripe_unit must be a multiple of PAGE_SIZE
      * stripe_unit * stripe_count is maximum upto 32-bit (4Gb)
      
      Tested RAID0 over mirrors, RAID0 only, mirrors only. All check.
      
      Design notes:
      * I'm not using a vectored raid-engine mechanism yet. Following the
        pnfs-objects-layout data-map structure, "Mirror" is just a private
        case of "group_width" == 1, and RAID0 is a private case of
        "Mirrors" == 1. The performance lose of the general case over the
        particular special case optimization is totally negligible, also
        considering the extra code size.
      
      * In general I added a prepare_stripes() stage that divides the
        to-be-io pages to the participating devices, the previous
        exofs_ios_write/read, now becomes _write/read_mirrors and a new
        write/read upper layer loops on all devices calling
        _write/read_mirrors. Effectively the prepare_stripes stage is the all
        secret.
        Also truncate need fixing to accommodate for striping.
      
      * In a RAID0 arrangement, in a regular usage scenario, if all inode
        layouts will start at the same device, the small files fill up the
        first device and the later devices stay empty, the farther the device
        the emptier it is.
      
        To fix that, each inode will start at a different stripe_unit,
        according to it's obj_id modulus number-of-stripe-units. And
        will then span all stripe-units in the same incrementing order
        wrapping back to the beginning of the device table. We call it
        a stripe-units moving window.
      
        Special consideration was taken to keep all devices in a mirror
        arrangement identical. So a broken osd-device could just be cloned
        from one of the mirrors and no FS scrubbing is needed. (We do that
        by rotating stripe-unit at a time and not a single device at a time.)
      
      TODO:
       We no longer verify object_length == inode->i_size in exofs_iget.
       (since i_size is stripped on multiple objects now).
       I should introduce a multiple-device attribute reading, and use
       it in exofs_iget.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      5d952b83
    • B
      exofs: Define on-disk per-inode optional layout attribute · d9c740d2
      Boaz Harrosh 提交于
      * Layouts describe the way a file is spread on multiple devices.
        The layout information is stored in the objects attribute introduced
        in this patch.
      
      * There can be multiple generating function for the layout.
        Currently defined:
          - No attribute present - use below moving-window on global
            device table, all devices.
            (This is the only one currently used in exofs)
          - an obj_id generated moving window - the obj_id is a randomizing
            factor in the otherwise global map layout.
          - An explicit layout stored, including a data_map and a device
            index list.
          - More might be defined in future ...
      
      * There are two attributes defined of the same structure:
        A-data-files-layout - This layout is used by data-files. If present
                              at a directory, all files of that directory will
                              be created with this layout.
        A-meta-data-layout - This layout is used by a directory and other
                             meta-data information. Also inherited at creation
                             of subdirectories.
      
      * At creation time inodes are created with the layout specified above.
        A usermode utility may change the creation layout on a give directory
        or file. Which in the case of directories, will also apply to newly
        created files/subdirectories, children of that directory.
        In the simple unaltered case of a newly created exofs, no layout
        attributes are present, and all layouts adhere to the layout specified
        at the device-table.
      
      * In case of a future file system loaded in an old exofs-driver.
        At iget(), the generating_function is inspected and if not supported
        will return an IO error to the application and the inode will not
        be loaded. So not to damage any data.
        Note: After this patch we do not yet support any type of layout
              only the RAID0 patch that enables striping at the super-block
              level will add support for RAID0 layouts above. This way we
              are past and future compatible and fully bisectable.
      
      * Access to the device table is done by an accessor since
        it will change according to above information.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      d9c740d2
    • B
      exofs: unindent exofs_sbi_read · 46f4d973
      Boaz Harrosh 提交于
      The original idea was that a mirror read can be sub-divided
      to multiple devices. But this has very little gain and only
      at very large IOes so it's not going to be implemented soon.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      46f4d973
    • B
      exofs: Move layout related members to a layout structure · 45d3abcb
      Boaz Harrosh 提交于
      * Abstract away those members in exofs_sb_info that are related/needed
        by a layout into a new exofs_layout structure. Embed it in exofs_sb_info.
      
      * At exofs_io_state receive/keep a pointer to an exofs_layout. No need for
        an exofs_sb_info pointer, all we need is at exofs_layout.
      
      * Change any usage of above exofs_sb_info members to their new name.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      45d3abcb
    • B
      exofs: Recover in the case of read-passed-end-of-file · 22ddc556
      Boaz Harrosh 提交于
      In check_io, implement the case of reading passed end of
      file, by clearing the pages and recover with no error. In
      a raid arrangement this can become a legitimate situation
      in case of holes in the file.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      22ddc556
    • B
      exofs: Micro-optimize exofs_i_info · 518f167a
      Boaz Harrosh 提交于
      optimize the exofs_i_info struct usage by moving the embedded
      vfs_inode to be first. A compiler might optimize away an "add"
      operation with constant zero. (Which it cannot with other constants)
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      518f167a
    • B
      exofs: debug print even less · 34ce4e7c
      Boaz Harrosh 提交于
      * Last debug trimming left in some stupid print, remove them.
        Fixup some other prints
      * Shift printing from inode.c to ios.c
      * Add couple of prints when memory allocation fails.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      34ce4e7c
  2. 30 1月, 2010 8 次提交
    • L
      Linux 2.6.33-rc6 · abe94c75
      Linus Torvalds 提交于
      abe94c75
    • D
      mfd: Fix asic3 build · 4995c0b3
      Dmitry Artamonow 提交于
      asic3 also needs tmio_core or otherwise will fail to build.
      Signed-off-by: NDmitry Artamonow <mad_soft@inbox.ru>
      Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
      4995c0b3
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 499a2673
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: update multi-touch protocol documentation
        Input: add the ABS_MT_PRESSURE event
        Input: winbond-cir - remove dmesg spam
        Input: lifebook - add another Lifebook DMI signature
        Input: ad7879 - support auxiliary GPIOs via gpiolib
      499a2673
    • H
      mm: fix migratetype bug which slowed swapping · a7016235
      Hugh Dickins 提交于
      After memory pressure has forced it to dip into the reserves, 2.6.32's
      5f8dcc21 "page-allocator: split per-cpu
      list into one-list-per-migrate-type" has been returning MIGRATE_RESERVE
      pages to the MIGRATE_MOVABLE free_list: in some sense depleting reserves.
      
      Fix that in the most straightforward way (which, considering the overheads
      of alternative approaches, is Mel's preference): the right migratetype is
      already in page_private(page), but free_pcppages_bulk() wasn't using it.
      
      How did this bug show up?  As a 20% slowdown in my tmpfs loop kbuild
      swapping tests, on PowerMac G5 with SLUB allocator.  Bisecting to that
      commit was easy, but explaining the magnitude of the slowdown not easy.
      
      The same effect appears, but much less markedly, with SLAB, and even
      less markedly on other machines (the PowerMac divides into fewer zones
      than x86, I think that may be a factor).  We guess that lumpy reclaim
      of short-lived high-order pages is implicated in some way, and probably
      this bug has been tickling a poor decision somewhere in page reclaim.
      
      But instrumentation hasn't told me much, I've run out of time and
      imagination to determine exactly what's going on, and shouldn't hold up
      the fix any longer: it's valid, and might even fix other misbehaviours.
      Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7016235
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · 67f15b06
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
        Btrfs: check total number of devices when removing missing
        Btrfs: check return value of open_bdev_exclusive properly
        Btrfs: do not mark the chunk as readonly if in degraded mode
        Btrfs: run orphan cleanup on default fs root
        Btrfs: fix a memory leak in btrfs_init_acl
        Btrfs: Use correct values when updating inode i_size on fallocate
        Btrfs: remove tree_search() in extent_map.c
        Btrfs: Add mount -o compress-force
      67f15b06
    • D
      sparc: TIF_ABI_PENDING bit removal · 94673e96
      David Miller 提交于
      Here are the sparc bits to remove TIF_ABI_PENDING now that
      set_personality() is called at the appropriate place in exec.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94673e96
    • H
      x86: get rid of the insane TIF_ABI_PENDING bit · 05d43ed8
      H. Peter Anvin 提交于
      Now that the previous commit made it possible to do the personality
      setting at the point of no return, we do just that for ELF binaries.
      And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
      away, and we can just make SET_PERSONALITY() just do the obvious thing
      for a 32-bit compat process.
      
      Everything becomes much more straightforward this way.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      05d43ed8
    • L
      Split 'flush_old_exec' into two functions · 221af7f8
      Linus Torvalds 提交于
      'flush_old_exec()' is the point of no return when doing an execve(), and
      it is pretty badly misnamed.  It doesn't just flush the old executable
      environment, it also starts up the new one.
      
      Which is very inconvenient for things like setting up the new
      personality, because we want the new personality to affect the starting
      of the new environment, but at the same time we do _not_ want the new
      personality to take effect if flushing the old one fails.
      
      As a result, the x86-64 '32-bit' personality is actually done using this
      insane "I'm going to change the ABI, but I haven't done it yet" bit
      (TIF_ABI_PENDING), with SET_PERSONALITY() not actually setting the
      personality, but just the "pending" bit, so that "flush_thread()" can do
      the actual personality magic.
      
      This patch in no way changes any of that insanity, but it does split the
      'flush_old_exec()' function up into a preparatory part that can fail
      (still called flush_old_exec()), and a new part that will actually set
      up the new exec environment (setup_new_exec()).  All callers are changed
      to trivially comply with the new world order.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      221af7f8
  3. 29 1月, 2010 20 次提交
  4. 28 1月, 2010 1 次提交