1. 31 8月, 2011 2 次提交
  2. 29 8月, 2011 2 次提交
    • N
      [SCSI] genhd: add a new attribute "alias" in gendisk · a72c5e5e
      Nao Nishijima 提交于
      This patch allows the user to set an "alias" of the disk via sysfs interface.
      
      This patch only adds a new attribute "alias" in gendisk structure.
      To show the alias instead of the device name in kernel messages,
      we need to revise printk messages and use alias_name() in them.
      
      Example:
      (current) printk("disk name is %s\n", disk->disk_name);
      (new)     printk("disk name is %s\n", alias_name(disk));
      
      Users can use alphabets, numbers, '-' and '_' in "alias" attribute. A disk can
      have an "alias" which length is up to 255 bytes. This attribute is write-once.
      Suggested-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      Suggested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NNao Nishijima <nao.nishijima.xt@hitachi.com>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      a72c5e5e
    • M
      [SCSI] scsi scan: don't fail scans when host is in recovery · 76e4e12f
      Mike Christie 提交于
      The problem is that if we are doing a scsi scan then the device goes
      into recovery then we will wait for the recovery to complete. It waits
      because scsi-ml will send inquiries or report luns and the queueing code
      will have been blocked due to the host not being ready. However, if we
      are in recovery and then a scan is started the scan will silently fail
      and some devices will not be added.
      
      It is easy to hit the problem where devices do not show up with
      FC where we are doing tests that disrupt the target controllers.
      When the controller is disruprted (reboot, or setting firmware, etc),
      and we cause the dev loss tmo to fire then devices will be removed
      Then when the problem has been fixed, the rport will be scanned and
      devices should be added back. But if we cause another disruption before
      scanning has started then devices will not get added back. If the problem
      is not started until the scan is started then the devices will be added
      back.
      
      This patch fixes that problem by not failing scans when the host
      is in recovery. We will let scsi-ml send the IO and let the queueing
      and scsi error handling deal with it like is done if we went into
      recovery while scanning.
      
      For recovery cases where the host is being torn down then with the
      patch we will still fail the scan since there is not point in scanning.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      76e4e12f
  3. 27 8月, 2011 14 次提交
  4. 18 8月, 2011 2 次提交
  5. 16 8月, 2011 1 次提交
    • J
      block: fix flush machinery for stacking drivers with differring flush flags · 4853abaa
      Jeff Moyer 提交于
      Commit ae1b1539, block: reimplement
      FLUSH/FUA to support merge, introduced a performance regression when
      running any sort of fsyncing workload using dm-multipath and certain
      storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
      dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
      that dm-multipath always advertised flush+fua support, and passed
      commands on down the stack, where those flags used to get stripped off.
      The above commit changed that behavior:
      
      static inline struct request *__elv_next_request(struct request_queue *q)
      {
              struct request *rq;
      
              while (1) {
      -               while (!list_empty(&q->queue_head)) {
      +               if (!list_empty(&q->queue_head)) {
                              rq = list_entry_rq(q->queue_head.next);
      -                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
      -                           (rq->cmd_flags & REQ_FLUSH_SEQ))
      -                               return rq;
      -                       rq = blk_do_flush(q, rq);
      -                       if (rq)
      -                               return rq;
      +                       return rq;
                      }
      
      Note that previously, a command would come in here, have
      REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
      
      struct request *blk_do_flush(struct request_queue *q, struct request *rq)
      {
              unsigned int fflags = q->flush_flags; /* may change, cache it */
              bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
              bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
              bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
              REQ_FUA);
              unsigned skip = 0;
      ...
              if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                      rq->cmd_flags &= ~REQ_FLUSH;
      		if (!has_fua)
      			rq->cmd_flags &= ~REQ_FUA;
      	        return rq;
      	}
      
      So, the flush machinery was bypassed in such cases (q->flush_flags == 0
      && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
      
      Now, however, we don't get into the flush machinery at all.  Instead,
      __elv_next_request just hands a request with flush and fua bits set to
      the scsi_request_fn, even if the underlying request_queue does not
      support flush or fua.
      
      The agreed upon approach is to fix the flush machinery to allow
      stacking.  While this isn't used in practice (since there is only one
      request-based dm target, and that target will now reflect the flush
      flags of the underlying device), it does future-proof the solution, and
      make it function as designed.
      
      In order to make this work, I had to add a field to the struct request,
      inside the flush structure (to store the original req->end_io).  Shaohua
      had suggested overloading the union with rb_node and completion_data,
      but the completion data is used by device mapper and can also be used by
      other drivers.  So, I didn't see a way around the additional field.
      
      I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
      the lost performance.  Comments and other testers, as always, are
      appreciated.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4853abaa
  6. 14 8月, 2011 2 次提交
  7. 12 8月, 2011 2 次提交
    • J
      ASoC: omap: Update e-mail address of Jarkko Nikula · 7ec41ee5
      Jarkko Nikula 提交于
      My gmail account got disabled and I'm not going to reopen it.
      Signed-off-by: NJarkko Nikula <jarkko.nikula@bitmer.com>
      Acked-by: NLiam Girdwood <lrg@ti.com>
      Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      7ec41ee5
    • V
      move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997
      Vasiliy Kulikov 提交于
      The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
      check in set_user() to check for NPROC exceeding via setuid() and
      similar functions.
      
      Before the check there was a possibility to greatly exceed the allowed
      number of processes by an unprivileged user if the program relied on
      rlimit only.  But the check created new security threat: many poorly
      written programs simply don't check setuid() return code and believe it
      cannot fail if executed with root privileges.  So, the check is removed
      in this patch because of too often privilege escalations related to
      buggy programs.
      
      The NPROC can still be enforced in the common code flow of daemons
      spawning user processes.  Most of daemons do fork()+setuid()+execve().
      The check introduced in execve() (1) enforces the same limit as in
      setuid() and (2) doesn't create similar security issues.
      
      Neil Brown suggested to track what specific process has exceeded the
      limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
      this process would fail on execve(), and other processes' execve()
      behaviour is not changed.
      
      Solar Designer suggested to re-check whether NPROC limit is still
      exceeded at the moment of execve().  If the process was sleeping for
      days between set*uid() and execve(), and the NPROC counter step down
      under the limit, the defered execve() failure because NPROC limit was
      exceeded days ago would be unexpected.  If the limit is not exceeded
      anymore, we clear the flag on successful calls to execve() and fork().
      
      The flag is also cleared on successful calls to set_user() as the limit
      was exceeded for the previous user, not the current one.
      
      Similar check was introduced in -ow patches (without the process flag).
      
      v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72fa5997
  8. 11 8月, 2011 2 次提交
  9. 10 8月, 2011 1 次提交
  10. 09 8月, 2011 3 次提交
  11. 08 8月, 2011 4 次提交
  12. 07 8月, 2011 5 次提交
    • L
      vfs: optimize inode cache access patterns · 3ddcd056
      Linus Torvalds 提交于
      The inode structure layout is largely random, and some of the vfs paths
      really do care.  The path lookup in particular is already quite D$
      intensive, and profiles show that accessing the 'inode->i_op->xyz'
      fields is quite costly.
      
      We already optimized the dcache to not unnecessarily load the d_op
      structure for members that are often NULL using the DCACHE_OP_xyz bits
      in dentry->d_flags, and this does something very similar for the inode
      ops that are used during pathname lookup.
      
      It also re-orders the fields so that the fields accessed by 'stat' are
      together at the beginning of the inode structure, and roughly in the
      order accessed.
      
      The effect of this seems to be in the 1-2% range for an empty kernel
      "make -j" run (which is fairly kernel-intensive, mostly in filename
      lookup), so it's visible.  The numbers are fairly noisy, though, and
      likely depend a lot on exact microarchitecture.  So there's more tuning
      to be done.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ddcd056
    • L
      vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e
      Linus Torvalds 提交于
      Gcc tends to generate better code with small integers, including the
      DCACHE_xyz flag tests - so move the common ones to be first in the list.
      Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
      DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
      tree.
      
      And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
      common case to be a nice straight-line fall-through.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      830c0f0e
    • B
      exofs: Rename raid engine from exofs/ios.c => ore · 8ff660ab
      Boaz Harrosh 提交于
      ORE stands for "Objects Raid Engine"
      
      This patch is a mechanical rename of everything that was in ios.c
      and its API declaration to an ore.c and an osd_ore.h header. The ore
      engine will later be used by the pnfs objects layout driver.
      
      * File ios.c => ore.c
      
      * Declaration of types and API are moved from exofs.h to a new
        osd_ore.h
      
      * All used types are prefixed by ore_ from their exofs_ name.
      
      * Shift includes from exofs.h to osd_ore.h so osd_ore.h is
        independent, include it from exofs.h.
      
      Other than a pure rename there are no other changes. Next patch
      will move the ore into it's own module and will export the API
      to be used by exofs and later the layout driver
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      8ff660ab
    • D
      net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea
      David S. Miller 提交于
      Computers have become a lot faster since we compromised on the
      partial MD4 hash which we use currently for performance reasons.
      
      MD5 is a much safer choice, and is inline with both RFC1948 and
      other ISS generators (OpenBSD, Solaris, etc.)
      
      Furthermore, only having 24-bits of the sequence number be truly
      unpredictable is a very serious limitation.  So the periodic
      regeneration and 8-bit counter have been removed.  We compute and
      use a full 32-bit sequence number.
      
      For ipv6, DCCP was found to use a 32-bit truncated initial sequence
      number (it needs 43-bits) and that is fixed here as well.
      Reported-by: NDan Kaminsky <dan@doxpara.com>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5714ea
    • D
      crypto: Move md5_transform to lib/md5.c · bc0b96b5
      David S. Miller 提交于
      We are going to use this for TCP/IP sequence number and fragment ID
      generation.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc0b96b5