1. 25 6月, 2012 2 次提交
    • A
      iommu: Remove group_mf · 7d43c2e4
      Alex Williamson 提交于
      The iommu=group_mf is really no longer needed with the addition of ACS
      support in IOMMU drivers creating groups.  Most multifunction devices
      will now be grouped already.  If a device has gone to the trouble of
      exposing ACS, trust that it works.  We can use the device specific ACS
      function for fixing devices we trust individually.  This largely
      reverts bcb71abe.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      7d43c2e4
    • A
      iommu: IOMMU Groups · d72e31c9
      Alex Williamson 提交于
      IOMMU device groups are currently a rather vague associative notion
      with assembly required by the user or user level driver provider to
      do anything useful.  This patch intends to grow the IOMMU group concept
      into something a bit more consumable.
      
      To do this, we first create an object representing the group, struct
      iommu_group.  This structure is allocated (iommu_group_alloc) and
      filled (iommu_group_add_device) by the iommu driver.  The iommu driver
      is free to add devices to the group using it's own set of policies.
      This allows inclusion of devices based on physical hardware or topology
      limitations of the platform, as well as soft requirements, such as
      multi-function trust levels or peer-to-peer protection of the
      interconnects.  Each device may only belong to a single iommu group,
      which is linked from struct device.iommu_group.  IOMMU groups are
      maintained using kobject reference counting, allowing for automatic
      removal of empty, unreferenced groups.  It is the responsibility of
      the iommu driver to remove devices from the group
      (iommu_group_remove_device).
      
      IOMMU groups also include a userspace representation in sysfs under
      /sys/kernel/iommu_groups.  When allocated, each group is given a
      dynamically assign ID (int).  The ID is managed by the core IOMMU group
      code to support multiple heterogeneous iommu drivers, which could
      potentially collide in group naming/numbering.  This also keeps group
      IDs to small, easily managed values.  A directory is created under
      /sys/kernel/iommu_groups for each group.  A further subdirectory named
      "devices" contains links to each device within the group.  The iommu_group
      file in the device's sysfs directory, which formerly contained a group
      number when read, is now a link to the iommu group.  Example:
      
      $ ls -l /sys/kernel/iommu_groups/26/devices/
      total 0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:00:1e.0 ->
      		../../../../devices/pci0000:00/0000:00:1e.0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.0 ->
      		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.0
      lrwxrwxrwx. 1 root root 0 Apr 17 12:57 0000:06:0d.1 ->
      		../../../../devices/pci0000:00/0000:00:1e.0/0000:06:0d.1
      
      $ ls -l  /sys/kernel/iommu_groups/26/devices/*/iommu_group
      [truncating perms/owner/timestamp]
      /sys/kernel/iommu_groups/26/devices/0000:00:1e.0/iommu_group ->
      					../../../kernel/iommu_groups/26
      /sys/kernel/iommu_groups/26/devices/0000:06:0d.0/iommu_group ->
      					../../../../kernel/iommu_groups/26
      /sys/kernel/iommu_groups/26/devices/0000:06:0d.1/iommu_group ->
      					../../../../kernel/iommu_groups/26
      
      Groups also include several exported functions for use by user level
      driver providers, for example VFIO.  These include:
      
      iommu_group_get(): Acquires a reference to a group from a device
      iommu_group_put(): Releases reference
      iommu_group_for_each_dev(): Iterates over group devices using callback
      iommu_group_[un]register_notifier(): Allows notification of device add
              and remove operations relevant to the group
      iommu_group_id(): Return the group number
      
      This patch also extends the IOMMU API to allow attaching groups to
      domains.  This is currently a simple wrapper for iterating through
      devices within a group, but it's expected that the IOMMU API may
      eventually make groups a more integral part of domains.
      
      Groups intentionally do not try to manage group ownership.  A user
      level driver provider must independently acquire ownership for each
      device within a group before making use of the group as a whole.
      This may change in the future if group usage becomes more pervasive
      across both DMA and IOMMU ops.
      
      Groups intentionally do not provide a mechanism for driver locking
      or otherwise manipulating driver matching/probing of devices within
      the group.  Such interfaces are generic to devices and beyond the
      scope of IOMMU groups.  If implemented, user level providers have
      ready access via iommu_group_for_each_dev and group notifiers.
      
      iommu_device_group() is removed here as it has no users.  The
      replacement is:
      
      	group = iommu_group_get(dev);
      	id = iommu_group_id(group);
      	iommu_group_put(group);
      
      AMD-Vi & Intel VT-d support re-added in following patches.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      d72e31c9
  2. 07 6月, 2012 1 次提交
  3. 04 6月, 2012 1 次提交
  4. 03 6月, 2012 1 次提交
    • J
      dm thin: provide userspace access to pool metadata · cc8394d8
      Joe Thornber 提交于
      This patch implements two new messages that can be sent to the thin
      pool target allowing it to take a snapshot of the _metadata_.  This,
      read-only snapshot can be accessed by userland, concurrently with the
      live target.
      
      Only one metadata snapshot can be held at a time.  The pool's status
      line will give the block location for the current msnap.
      
      Since version 0.1.5 of the userland thin provisioning tools, the
      thin_dump program displays the msnap as follows:
      
          thin_dump -m <msnap root> <metadata dev>
      
      Available here: https://github.com/jthornber/thin-provisioning-tools
      
      Now that userland can access the metadata we can do various things
      that have traditionally been kernel side tasks:
      
           i) Incremental backups.
      
           By using metadata snapshots we can work out what blocks have
           changed over time.  Combined with data snapshots we can ensure
           the data doesn't change while we back it up.
      
           A short proof of concept script can be found here:
      
           https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
      
           ii) Migration of thin devices from one pool to another.
      
           iii) Merging snapshots back into an external origin.
      
           iv) Asyncronous replication.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cc8394d8
  5. 02 6月, 2012 2 次提交
    • M
      x86, efi: Add EFI boot stub documentation · 0c759662
      Matt Fleming 提交于
      Since we can't expect every user to read the EFI boot stub code it
      seems prudent to have a couple of paragraphs explaining what it is and
      how it works.
      
      The "initrd=" option in particular is tricky because it only
      understands absolute EFI-style paths (backslashes as directory
      separators), and until now this hasn't been documented anywhere. This
      has tripped up a couple of users.
      
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1331907517-3985-4-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      0c759662
    • J
      fs: introduce inode operation ->update_time · c3b2da31
      Josef Bacik 提交于
      Btrfs has to make sure we have space to allocate new blocks in order to modify
      the inode, so updating time can fail.  We've gotten around this by having our
      own file_update_time but this is kind of a pain, and Christoph has indicated he
      would like to make xfs do something different with atime updates.  So introduce
      ->update_time, where we will deal with i_version an a/m/c time updates and
      indicate which changes need to be made.  The normal version just does what it
      has always done, updates the time and marks the inode dirty, and then
      filesystems can choose to do something different.
      
      I've gone through all of the users of file_update_time and made them check for
      errors with the exception of the fault code since it's complicated and I wasn't
      quite sure what to do there, also Jan is going to be pushing the file time
      updates into page_mkwrite for those who have it so that should satisfy btrfs and
      make it not a big deal to check the file_update_time() return code in the
      generic fault path. Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      c3b2da31
  6. 01 6月, 2012 5 次提交
  7. 31 5月, 2012 1 次提交
  8. 30 5月, 2012 18 次提交
  9. 25 5月, 2012 8 次提交
    • S
      doc: ext3: update documentation with barrier=1 default · 21848d08
      Stefan Hajnoczi 提交于
      Commit 00eacd66 ("ext3: make ext3 mount default to barrier=1") changed
      the default barrier mount option for ext3.  The documentation needs to
      be updated, so this patch does that.
      Signed-off-by: NStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Acked-by: NRob Landley <rob@landley.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      21848d08
    • M
      Documentation/initrd.txt: Change the location of util-linux · 76dab97c
      Marcos Paulo de Souza 提交于
      The address of util-linux seems deprecated. The new util-linux location is
      in the kernel.org. So, change this for the correct address.
      Signed-off-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Acked-by: NRob Landley <rob@landley.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      76dab97c
    • M
      Documentation/SubmittingPatches: suggested the use of scripts/get_maintainer.pl · e52d2e1f
      Michel Machado 提交于
      Had I found a reference to scripts/get_maintainer.pl when I first read
      Documentation/SubmittingPatches, it would've saved me some time.
      Signed-off-by: NMichel Machado <michel@digirati.com.br>
      Acked-by: NRob Landley <rob@landley.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      e52d2e1f
    • S
      Documentation/kernel-parameters: remove autotest and mcatest · 9b170dbd
      Sebastian Andrzej Siewior 提交于
      It has no more users, the last one is gone in "[PATCH] ia64: Kconfig
      cleanup" aka ("6fd79ab50b").
      mcatest is gone in commit "[PATCH] ia64: SGI SN update"
      ("c6bacd5010ec").
      
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
      Acked-by: NRob Landley <rob@landley.net>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      9b170dbd
    • D
      dma-buf: add initial vmap documentation · b25b086d
      Dave Airlie 提交于
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
      b25b086d
    • D
      dma-buf: mmap support · 4c78513e
      Daniel Vetter 提交于
      Compared to Rob Clark's RFC I've ditched the prepare/finish hooks
      and corresponding ioctls on the dma_buf file. The major reason for
      that is that many people seem to be under the impression that this is
      also for synchronization with outstanding asynchronous processsing.
      I'm pretty massively opposed to this because:
      
      - It boils down reinventing a new rather general-purpose userspace
        synchronization interface. If we look at things like futexes, this
        is hard to get right.
      - Furthermore a lot of kernel code has to interact with this
        synchronization primitive. This smells a look like the dri1 hw_lock,
        a horror show I prefer not to reinvent.
      - Even more fun is that multiple different subsystems would interact
        here, so we have plenty of opportunities to create funny deadlock
        scenarios.
      
      I think synchronization is a wholesale different problem from data
      sharing and should be tackled as an orthogonal problem.
      
      Now we could demand that prepare/finish may only ensure cache
      coherency (as Rob intended), but that runs up into the next problem:
      We not only need mmap support to facilitate sw-only processing nodes
      in a pipeline (without jumping through hoops by importing the dma_buf
      into some sw-access only importer), which allows for a nicer
      ION->dma-buf upgrade path for existing Android userspace. We also need
      mmap support for existing importing subsystems to support existing
      userspace libraries. And a loot of these subsystems are expected to
      export coherent userspace mappings.
      
      So prepare/finish can only ever be optional and the exporter /needs/
      to support coherent mappings. Given that mmap access is always
      somewhat fallback-y in nature I've decided to drop this optimization,
      instead of just making it optional. If we demonstrate a clear need for
      this, supported by benchmark results, we can always add it in again
      later as an optional extension.
      
      Other differences compared to Rob's RFC is the above mentioned support
      for mapping a dma-buf through facilities provided by the importer.
      Which results in mmap support no longer being optional.
      
      Note that this dma-buf mmap patch does _not_ support every possible
      insanity an existing subsystem could pull of with mmap: Because it
      does not allow to intercept pagefaults and shoot down ptes importing
      subsystems can't add some magic of their own at these points (e.g. to
      automatically synchronize with outstanding rendering or set up some
      special resources). I've done a cursory read through a few mmap
      implementions of various subsytems and I'm hopeful that we can avoid
      this (and the complexity it'd bring with it).
      
      Additonally I've extended the documentation a bit to explain the hows
      and whys of this mmap extension.
      
      In case we ever want to add support for explicitly cache maneged
      userspace mmap with a prepare/finish ioctl pair, we could specify that
      userspace needs to mmap a different part of the dma_buf, e.g. the
      range starting at dma_buf->size up to dma_buf->size*2. This works
      because the size of a dma_buf is invariant over it's lifetime. The
      exporter would obviously need to fall back to coherent mappings for
      both ranges if a legacy clients maps the coherent range and the
      architecture cannot suppor conflicting caching policies. Also, this
      would obviously be optional and userspace needs to be able to fall
      back to coherent mappings.
      
      v2:
      - Spelling fixes from Rob Clark.
      - Compile fix for !DMA_BUF from Rob Clark.
      - Extend commit message to explain how explicitly cache managed mmap
        support could be added later.
      - Extend the documentation with implementations notes for exporters
        that need to manually fake coherency.
      
      v3:
      - dma_buf pointer initialization goof-up noticed by Rebecca Schultz
        Zavin.
      
      Cc: Rob Clark <rob.clark@linaro.org>
      Cc: Rebecca Schultz Zavin <rebecca@android.com>
      Acked-by: NRob Clark <rob.clark@linaro.org>
      Signed-Off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
      4c78513e
    • M
      tick: Add tick skew boot option · 5307c955
      Mike Galbraith 提交于
      Let the user decide whether power consumption or jitter is the
      more important consideration for their machines.
      
      Quoting removal commit af5ab277:
      
      "Historically, Linux has tried to make the regular timer tick on the
       various CPUs not happen at the same time, to avoid contention on
       xtime_lock.
          
       Nowadays, with the tickless kernel, this contention no longer happens
       since time keeping and updating are done differently. In addition,
       this skew is actually hurting power consumption in a measurable way on
       many-core systems."
      
      Problems:
      
      - Contrary to the above, systems do encounter contention on both
        xtime_lock and RCU structure locks when the tick is synchronized.
        
      - Moderate sized RT systems suffer intolerable jitter due to the tick
        being synchronized.
      
      - SGI reports the same for their large systems.
      
      - Fully utilized systems reap no power saving benefit from skew removal,
        but do suffer from resulting induced lock contention.
      
      - 0209f649 rcu: limit rcu_node leaf-level fanout
        This patch was born to combat lock contention which testing showed
        to have been _induced by_ skew removal.  Skew the tick, contention
        disappeared virtually completely.
      Signed-off-by: NMike Galbraith <mgalbraith@suse.de>
      Link: http://lkml.kernel.org/r/1336472458.21924.78.camel@marge.simpson.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5307c955
    • J
      net/wanrouter: Deprecate and schedule for removal · f0d1b3c2
      Joe Perches 提交于
      No one uses this on current kernels anymore.
      
      Let it be known it's going to be removed eventually.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0d1b3c2
  10. 23 5月, 2012 1 次提交