1. 03 3月, 2013 1 次提交
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  2. 02 3月, 2013 7 次提交
    • A
      dm: add target num_write_bios fn · b0d8ed4d
      Alasdair G Kergon 提交于
      Add a num_write_bios function to struct target.
      
      If an instance of a target sets this, it will be queried before the
      target's mapping function is called on a write bio, and the response
      controls the number of copies of the write bio that the target will
      receive.
      
      This provides a convenient way for a target to send the same data to
      more than one device.  The new cache target uses this in writethrough
      mode, to send the data both to the cache and the backing device.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b0d8ed4d
    • M
      dm kcopyd: introduce configurable throttling · df5d2e90
      Mikulas Patocka 提交于
      This patch allows the administrator to reduce the rate at which kcopyd
      issues I/O.
      
      Each module that uses kcopyd acquires a throttle parameter that can be
      set in /sys/module/*/parameters.
      
      We maintain a history of kcopyd usage by each module in the variables
      io_period and total_period in struct dm_kcopyd_throttle. The actual
      kcopyd activity is calculated as a percentage of time equal to
      "(100 * io_period / total_period)".  This is compared with the user-defined
      throttle percentage threshold and if it is exceeded, we sleep.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      df5d2e90
    • M
      dm ioctl: allow message to return data · a2606241
      Mikulas Patocka 提交于
      This patch introduces enhanced message support that allows the
      device-mapper core to recognise messages that are common to all devices,
      and for messages to return data to userspace.
      
      Core messages are processed by the function "message_for_md".  If the
      device mapper doesn't support the message, it is passed to the target
      driver.
      
      If the message returns data, the kernel sets the flag
      DM_MESSAGE_OUT_FLAG.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a2606241
    • M
      dm ioctl: optimize functions without variable params · 02cde50b
      Mikulas Patocka 提交于
      Device-mapper ioctls receive and send data in a buffer supplied
      by userspace.  The buffer has two parts.  The first part contains
      a 'struct dm_ioctl' and has a fixed size.  The second part depends
      on the ioctl and has a variable size.
      
      This patch recognises the specific ioctls that do not use the variable
      part of the buffer and skips allocating memory for it.
      
      In particular, when a device is suspended and a resume ioctl is sent,
      this now avoid memory allocation completely.
      
      The variable "struct dm_ioctl tmp" is moved from the function
      copy_params to its caller ctl_ioctl and renamed to param_kernel.
      It is used directly when the ioctl function doesn't need any arguments.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02cde50b
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
    • R
      hsi: fix kernel-doc warnings · 8eae508b
      Randy Dunlap 提交于
      Fix kernel-doc warnings in hsi files:
      
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client'
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client'
        Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client'
        Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler'
        Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event'
      
      Don't document "private:" fields with kernel-doc notation.
      If you want to leave them fully documented, that's OK, but
      then don't mark them as "private:".
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Carlos Chinea <carlos.chinea@nokia.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-omap@vger.kernel.org
      Acked-by: NNishanth Menon <nm@ti.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8eae508b
  3. 01 3月, 2013 3 次提交
  4. 28 2月, 2013 22 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
    • M
      include/linux/eventfd.h: fix incorrect filename is a comment · 1d730c49
      Martin Sustrik 提交于
      Comment in eventfd.h referred to 'include/asm-generic/fcntl.h'
      while the correct path is 'include/uapi/asm-generic/fcntl.h'.
      Signed-off-by: NMartin Sustrik <sustrik@250bpm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d730c49
    • A
      nbd: support FLUSH requests · 75f187ab
      Alex Bligh 提交于
      Currently, the NBD device does not accept flush requests from the Linux
      block layer.  If the NBD server opened the target with neither O_SYNC nor
      O_DSYNC, however, the device will be effectively backed by a writeback
      cache.  Without issuing flushes properly, operation of the NBD device will
      not be safe against power losses.
      
      The NBD protocol has support for both a cache flush command and a FUA
      command flag; the server will also pass a flag to note its support for
      these features.  This patch adds support for the cache flush command and
      flag.  In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
      and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush.  When the
      flag is active the block layer will send REQ_FLUSH requests, which we
      translate to NBD_CMD_FLUSH commands.
      
      FUA support is not included in this patch because all free software
      servers implement it with a full fdatasync; thus it has no advantage over
      supporting flush only.  Because I [Paolo] cannot really benchmark it in a
      realistic scenario, I cannot tell if it is a good idea or not.  It is also
      not clear if it is valid for an NBD server to support FUA but not flush.
      The Linux block layer gives a warning for this combination, the NBD
      protocol documentation says nothing about it.
      
      The patch also fixes a small problem in the handling of flags: nbd->flags
      must be cleared at the end of NBD_DO_IT, but the driver was not doing
      that.  The bug manifests itself as follows.  Suppose you two different
      client/server pairs to start the NBD device.  Suppose also that the first
      client supports NBD_SET_FLAGS, and the first server sends
      NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
      things.  Before this patch, the second invocation of NBD_DO_IT will use a
      stale value of nbd->flags, and the second server will issue an error every
      time it receives an NBD_CMD_FLUSH command.
      
      This bug is pre-existing, but it becomes much more important after this
      patch; flush failures make the device pretty much unusable, unlike
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAlex Bligh <alex@alex.org.uk>
      Acked-by: NPaul Clements <Paul.Clements@steeleye.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75f187ab
    • R
      ipmi: remove superfluous kernel/userspace explanation · 59fb1b9f
      Robert P. J. Day 提交于
      Given the obvious distinction between kernel and userspace supported
      by uapi/, it seems unnecessary to comment on that.
      Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NCorey Minyard <cminyard@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59fb1b9f
    • T
      idr: implement lookup hint · 0ffc2a9c
      Tejun Heo 提交于
      While idr lookup isn't a particularly heavy operation, it still is too
      substantial to use in hot paths without worrying about the performance
      implications.  With recent changes, each idr_layer covers 256 slots
      which should be enough to cover most use cases with single idr_layer
      making lookup hint very attractive.
      
      This patch adds idr->hint which points to the idr_layer which
      allocated an ID most recently and the fast path lookup becomes
      
      	if (look up target's prefix matches that of the hinted layer)
      		return hint->ary[ID's offset in the leaf layer];
      
      which can be inlined.
      
      idr->hint is set to the leaf node on idr_fill_slot() and cleared from
      free_layer().
      
      [andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ffc2a9c
    • T
      idr: add idr_layer->prefix · 54616283
      Tejun Heo 提交于
      Add a field which carries the prefix of ID the idr_layer covers.  This
      will be used to implement lookup hint.
      
      This patch doesn't make use of the new field and doesn't introduce any
      behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54616283
    • T
      idr: make idr_layer larger · 050a6b47
      Tejun Heo 提交于
      With recent preloading changes, idr no longer keeps full layer cache per
      each idr instance (used to be ~6.5k per idr on 64bit) and the previous
      patch removed restriction on the bitmap size.  Both now allow us to have
      larger layers.
      
      Increase IDR_BITS to 8 regardless of BITS_PER_LONG.  Each layer is
      slightly larger than 2k on 64bit and 1k on 32bit and carries 256 entries.
      The size isn't too large, especially compared to what we used to waste on
      per-idr caches, and 256 entries should be able to serve most use cases
      with single layer.  The max tree depth is 4 which is much better than the
      previous 6 on 64bit and 7 on 32bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      050a6b47
    • T
      idr: remove length restriction from idr_layer->bitmap · 1d9b2e1e
      Tejun Heo 提交于
      Currently, idr->bitmap is declared as an unsigned long which restricts
      the number of bits an idr_layer can contain.  All bitops can handle
      arbitrary positive integer bit number and there's no reason for this
      restriction.
      
      Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single
      unsigned long.
      
      * idr_layer->bitmap is now an array.  '&' dropped from params to
        bitops.
      
      * Replaced "== IDR_FULL" tests with bitmap_full() and removed
        IDR_FULL.
      
      * Replaced find_next_bit() on ~bitmap with find_next_zero_bit().
      
      * Replaced "bitmap = 0" with bitmap_clear().
      
      This patch doesn't (or at least shouldn't) introduce any behavior
      changes.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d9b2e1e
    • T
      idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c · e8c8d1bc
      Tejun Heo 提交于
      MAX_IDR_MASK is another weirdness in the idr interface.  As idr covers
      whole positive integer range, it's defined as 0x7fffffff or INT_MAX.
      
      Its usage in idr_find(), idr_replace() and idr_remove() is bizarre.
      They basically mask off the sign bit and operate on the rest, so if
      the caller, by accident, passes in a negative number, the sign bit
      will be masked off and the remaining part will be used as if that was
      the input, which is worse than crashing.
      
      The constant is visible in idr.h and there are several users in the
      kernel.
      
      * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter()
      
        Basically used to test if adap->nr is a negative number which isn't
        -1 and returns -EINVAL if so.  idr_alloc() already has negative
        @start checking (w/ WARN_ON_ONCE), so this can go away.
      
      * drivers/infiniband/core/cm.c:cm_alloc_id()
        drivers/infiniband/hw/mlx4/cm.c:id_map_alloc()
      
        Used to wrap cyclic @start.  Can be replaced with max(next, 0).
        Note that this type of cyclic allocation using idr is buggy.  These
        are prone to spurious -ENOSPC failure after the first wraparound.
      
      * fs/super.c:get_anon_bdev()
      
        The ID allocated from ida is masked off before being tested whether
        it's inside valid range.  ida allocated ID can never be a negative
        number and the masking is unnecessary.
      
      Update idr_*() functions to fail with -EINVAL when negative @id is
      specified and update other MAX_IDR_MASK users as described above.
      
      This leaves MAX_IDR_MASK without any user, remove it and relocate
      other MAX_IDR_* constants to lib/idr.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
      Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com>
      Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NWolfram Sang <wolfram@the-dreams.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c8d1bc
    • T
      idr: implement idr_preload[_end]() and idr_alloc() · d5c7409f
      Tejun Heo 提交于
      The current idr interface is very cumbersome.
      
      * For all allocations, two function calls - idr_pre_get() and
        idr_get_new*() - should be made.
      
      * idr_pre_get() doesn't guarantee that the following idr_get_new*()
        will not fail from memory shortage.  If idr_get_new*() returns
        -EAGAIN, the caller is expected to retry pre_get and allocation.
      
      * idr_get_new*() can't enforce upper limit.  Upper limit can only be
        enforced by allocating and then freeing if above limit.
      
      * idr_layer buffer is unnecessarily per-idr.  Each idr ends up keeping
        around MAX_IDR_FREE idr_layers.  The memory consumed per idr is
        under two pages but it makes it difficult to make idr_layer larger.
      
      This patch implements the following new set of allocation functions.
      
      * idr_preload[_end]() - Similar to radix preload but doesn't fail.
        The first idr_alloc() inside preload section can be treated as if it
        were called with @gfp_mask used for idr_preload().
      
      * idr_alloc() - Allocate an ID w/ lower and upper limits.  Takes
        @gfp_flags and can be used w/o preloading.  When used inside
        preloaded section, the allocation mask of preloading can be assumed.
      
      If idr_alloc() can be called from a context which allows sufficiently
      relaxed @gfp_mask, it can be used by itself.  If, for example,
      idr_alloc() is called inside spinlock protected region, preloading can
      be used like the following.
      
      	idr_preload(GFP_KERNEL);
      	spin_lock(lock);
      
      	id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);
      
      	spin_unlock(lock);
      	idr_preload_end();
      	if (id < 0)
      		error;
      
      which is much simpler and less error-prone than idr_pre_get and
      idr_get_new*() loop.
      
      The new interface uses per-pcu idr_layer buffer and thus the number of
      idr's in the system doesn't affect the amount of memory used for
      preloading.
      
      idr_layer_alloc() is introduced to handle idr_layer allocations for
      both old and new ID allocation paths.  This is a bit hairy now but the
      new interface is expected to replace the old and the internal
      implementation eventually will become simpler.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5c7409f
    • T
      idr: remove _idr_rc_to_errno() hack · 12d1b439
      Tejun Heo 提交于
      idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate
      exception conditions internally.  The return value is later translated
      to errno values using _idr_rc_to_errno().
      
      This is confusing.  Drop the custom ones and consistently use -EAGAIN
      for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for
      "ran out of ID space".
      
      Due to the weird memory preloading mechanism, [ra]_get_new*() return
      -EAGAIN on memory shortage, so we need to substitute -ENOMEM w/
      -EAGAIN on those interface functions.  They'll eventually be cleaned
      up and the translations will go away.
      
      This patch doesn't introduce any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12d1b439
    • T
      idr: relocate idr_for_each_entry() and reorganize id[r|a]_get_new() · 49038ef4
      Tejun Heo 提交于
      * Move idr_for_each_entry() definition next to other idr related
        definitions.
      
      * Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above().
      
      This changes the implementation of idr_get_new() but the new
      implementation is trivial.  This patch doesn't introduce any
      functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49038ef4
    • T
      idr: cosmetic updates to struct / initializer definitions · 4106ecaf
      Tejun Heo 提交于
      * Tab align fields like a normal person.
      
      * Drop the unnecessary 0 inits from IDR_INIT().
      
      This patch is purely cosmetic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4106ecaf
    • T
      idr: deprecate idr_remove_all() · fe6e24ec
      Tejun Heo 提交于
      There was only one legitimate use of idr_remove_all() and a lot more of
      incorrect uses (or lack of it).  Now that idr_destroy() implies
      idr_remove_all() and all the in-kernel users updated not to use it,
      there's no reason to keep it around.  Mark it deprecated so that we can
      later unexport it.
      
      idr_remove_all() is made an inline function calling __idr_remove_all()
      to avoid triggering deprecated warning on EXPORT_SYMBOL().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe6e24ec
    • M
      lockdep: check that no locks held at freeze time · 6aa97070
      Mandeep Singh Baines 提交于
      We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
      deadlock if the lock is later acquired in the suspend or hibernate path
      (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
      cgroup_freezer if a lock is held inside a frozen cgroup that is later
      acquired by a process outside that group.
      
      [akpm@linux-foundation.org: export debug_check_no_locks_held]
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Ben Chan <benchan@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6aa97070
    • K
      coredump: remove redundant defines for dumpable states · e579d2c2
      Kees Cook 提交于
      The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
      defines introduced in 54b50199 ("coredump: warn about unsafe
      suid_dumpable / core_pattern combo").  Remove the new ones, and use the
      prior values instead.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: NChen Gang <gang.chen@asianux.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e579d2c2
    • O
      fat: mark fs as dirty on mount and clean on umount · b88a1058
      Oleksij Rempel 提交于
      There is no documented methods to mark FAT as dirty.  Unofficially MS
      started to use reserved Byte in boot sector for this purpose, at least
      since Win 2000.  With Win 7 user is warned if fs is dirty and asked to
      clean it.
      
      Different versions of Win, handle it in different ways, but always have
      same meaning:
      
      - Win 2000 and XP, set it on write operations and
        remove it after operation was finnished
      - Win 7, set dirty flag on first write and remove it on umount.
      
      We will do it as follows:
      
      - set dirty flag on mount. If fs was initially dirty, warn user,
        remember it and do not do any changes to boot sector.
      - clean it on umount. If fs was initially dirty, leave it dirty.
      - do not do any thing if fs mounted read-only.
      - TODO: leave fs dirty if we found some error after mount.
      Signed-off-by: NOleksij Rempel <bug-track@fisher-privat.net>
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b88a1058
    • O
      fat: add extended fileds to struct fat_boot_sector · 6b46419b
      Oleksij Rempel 提交于
      Later we will need "state" field to check if volume was cleanly unmounted.
      Signed-off-by: NOleksij Rempel <bug-track@fisher-privat.net>
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b46419b
    • V
      hfsplus: add osx.* prefix for handling namespace of Mac OS X extended attributes · 5841ca09
      Vyacheslav Dubeyko 提交于
      hfsplus: reworked support of extended attributes.
      
      Current mainline implementation of hfsplus file system driver treats as
      extended attributes only two fields (fdType and fdCreator) of user_info
      field in file description record (struct hfsplus_cat_file).  It is
      possible to get or set only these two fields as extended attributes.
      But HFS+ treats as com.apple.FinderInfo extended attribute an union of
      user_info and finder_info fields as for file (struct hfsplus_cat_file)
      as for folder (struct hfsplus_cat_folder).  Moreover, current mainline
      implementation of hfsplus file system driver doesn't support special
      metadata file - attributes tree.
      
      Mac OS X 10.4 and later support extended attributes by making use of the
      HFS+ filesystem Attributes file B*-tree feature which allows for named
      forks.  Mac OS X supports only inline extended attributes, limiting
      their size to 3802 bytes.  Any regular file may have a list of extended
      attributes.  HFS+ supports an arbitrary number of named forks.  Each
      attribute is denoted by a name and the associated data.  The name is a
      null-terminated Unicode string.  It is possible to list, to get, to set,
      and to remove extended attributes from files or directories.
      
      It exists some peculiarity during getting of extended attributes list by
      means of getfattr utility.  The getfattr utility expects prefix "user."
      before any extended attribute's name.  So, it ignores any names that
      don't contained such prefix.  Such behavior of getfattr utility results
      in unexpected empty output of extended attributes list even in the case
      when file (or folder) contains extended attributes.  It needs to use
      empty string as regular expression pattern for names matching (getfattr
      --match="").
      
      For support of extended attributes in HFS+:
      1. It was added necessary on-disk layout declarations related to Attributes
         tree into hfsplus_raw.h file.
      2. It was added attributes.c file with implementation of functionality of
         manipulation by records in Attributes tree.
      3. It was reworked hfsplus_listxattr, hfsplus_getxattr, hfsplus_setxattr
         functions in ioctl.c. Moreover, it was added hfsplus_removexattr method.
      
      This patch:
      
      Add osx.* prefix for handling namespace of Mac OS X extended attributes.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Reported-by: NHin-Tak Leung <htl10@users.sourceforge.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5841ca09
    • I
      lib/scatterlist: use page iterator in the mapping iterator · 4225fc85
      Imre Deak 提交于
      For better code reuse use the newly added page iterator to iterate
      through the pages.  The offset, length within the page is still
      calculated by the mapping iterator as well as the actual mapping.  Idea
      from Tejun Heo.
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4225fc85
    • I
      lib/scatterlist: add simple page iterator · a321e91b
      Imre Deak 提交于
      Add an iterator to walk through a scatter list a page at a time starting
      at a specific page offset.  As opposed to the mapping iterator this is
      meant to be small, performing well even in simple loops like collecting
      all pages on the scatterlist into an array or setting up an iommu table
      based on the pages' DMA address.
      Signed-off-by: NImre Deak <imre.deak@intel.com>
      Cc: Maxim Levitsky <maximlevitsky@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Tested-by: NStephen Warren <swarren@wwwdotorg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a321e91b
    • K
      backlight: add new lp8788 backlight driver · c5a51053
      Kim, Milo 提交于
      TI LP8788 PMU supports regulators, battery charger, RTC, ADC, backlight
      dri= ver and current sinks.  This patch enables LP8788 backlight module.
      
      (Brightness mode)
      The brightness is controlled by PWM input or I2C register.
      All modes are supported in the driver.
      
      (Platform data)
      Configurable data can be defined in the platform side.
       name                  : backlight driver name. (default: "lcd-backlight")
       initial_brightness    : initial value of backlight brightness
       bl_mode               : brightness control by PWM or lp8788 register
       dim_mode              : dimming mode selection
       full_scale            : full scale current setting
       rise_time             : brightness ramp up step time
       fall_time             : brightness ramp down step time
       pwm_pol               : PWM polarity setting when bl_mode is PWM based
       period_ns             : platform specific PWM period value. unit is nano.
      
      The default values are set in case no platform data is defined.
      
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NMilo(Woogyom) Kim <milo.kim@ti.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Samuel Ortiz <sameo@linux.intel.com>
      Cc: Thierry Reding <thierry.reding@avionic-design.de>
      Cc: "devendra.aaru" <devendra.aaru@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c5a51053
  5. 27 2月, 2013 7 次提交
    • D
      dma-buf: implement vmap refcounting in the interface logic · f00b4dad
      Daniel Vetter 提交于
      All drivers which implement this need to have some sort of refcount to
      allow concurrent vmap usage. Hence implement this in the dma-buf core.
      
      To protect against concurrent calls we need a lock, which potentially
      causes new funny locking inversions. But this shouldn't be a problem
      for exporters with statically allocated backing storage, and more
      dynamic drivers have decent issues already anyway.
      
      Inspired by some refactoring patches from Aaron Plattner, who
      implemented the same idea, but only for drm/prime drivers.
      
      v2: Check in dma_buf_release that no dangling vmaps are left.
      Suggested by Aaron Plattner. We might want to do similar checks for
      attachments, but that's for another patch. Also fix up ERR_PTR return
      for vmap.
      
      v3: Check whether the passed-in vmap address matches with the cached
      one for vunmap. Eventually we might want to remove that parameter -
      compared to the kmap functions there's no need for the vaddr for
      unmapping.  Suggested by Chris Wilson.
      
      v4: Fix a brown-paper-bag bug spotted by Aaron Plattner.
      
      Cc: Aaron Plattner <aplattner@nvidia.com>
      Reviewed-by: NAaron Plattner <aplattner@nvidia.com>
      Tested-by: NAaron Plattner <aplattner@nvidia.com>
      Reviewed-by: NRob Clark <rob@ti.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NSumit Semwal <sumit.semwal@linaro.org>
      f00b4dad
    • S
      libceph: add support for HASHPSPOOL pool flag · 83ca14fd
      Sage Weil 提交于
      The legacy behavior adds the pgid seed and pool together as the input for
      CRUSH.  That is problematic because each pool's PGs end up mapping to the
      same OSDs: 1.5 == 2.4 == 3.3 == ...
      
      Instead, if the HASHPSPOOL flag is set, we has the ps and pool together and
      feed that into CRUSH.  This ensures that two adjacent pools will map to
      an independent pseudorandom set of OSDs.
      
      Advertise our support for this via a protocol feature flag.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      83ca14fd
    • S
      libceph: update osd request/reply encoding · 1b83bef2
      Sage Weil 提交于
      Use the new version of the encoding for osd requests and replies.  In the
      process, update the way we are tracking request ops and reply lengths and
      results in the struct ceph_osd_request.  Update the rbd and fs/ceph users
      appropriately.
      
      The main changes are:
       - we keep pointers into the request memory for fields we need to update
         each time the request is sent out over the wire
       - we keep information about the result in an array in the request struct
         where the users can easily get at it.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      1b83bef2
    • S
      libceph: calculate placement based on the internal data types · 2169aea6
      Sage Weil 提交于
      Instead of using the old ceph_object_layout struct, update our internal
      ceph_calc_object_layout method to use the ceph_pg type.  This allows us to
      pass the full 32-bit precision of the pgid.seed to the callers.  It also
      allows some callers to avoid reaching into the request structures for the
      struct ceph_object_layout fields.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      2169aea6
    • S
      ceph: update support for PGID64, PGPOOL3, OSDENC protocol features · 4f6a7e5e
      Sage Weil 提交于
      Support (and require) the PGID64, PGPOOL3, and OSDENC protocol features.
      These have been present in ceph.git since v0.42, Feb 2012.  Require these
      features to simplify support; nobody is running older userspace.
      
      Note that the new request and reply encoding is still not in place, so the new
      code is not yet functional.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      4f6a7e5e
    • A
      ceph: update "ceph_features.h" · ec73a754
      Alex Elder 提交于
      This updates "include/linux/ceph/ceph_features.h" so all the feature
      bits defined in the user space code are defined here.
      
      The features supported by this implementation will still differ so
      that's not updated here.
      Signed-off-by: NAlex Elder <elder@inktank.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ec73a754
    • S
      libceph: decode into cpu-native ceph_pg type · 5b191d99
      Sage Weil 提交于
      Always decode data into our cpu-native ceph_pg type that has the correct
      field widths.  Limit any remaining uses of ceph_pg_v1 to dealing with the
      legacy protocol.
      Signed-off-by: NSage Weil <sage@inktank.com>
      Reviewed-by: NAlex Elder <elder@inktank.com>
      5b191d99