1. 21 9月, 2011 1 次提交
  2. 16 9月, 2011 2 次提交
    • M
      net: copy userspace buffers on device forwarding · 48c83012
      Michael S. Tsirkin 提交于
      dev_forward_skb loops an skb back into host networking
      stack which might hang on the memory indefinitely.
      In particular, this can happen in macvtap in bridged mode.
      Copy the userspace fragments to avoid blocking the
      sender in that case.
      
      As this patch makes skb_copy_ubufs extern now,
      I also added some documentation and made it clear
      the SKBTX_DEV_ZEROCOPY flag automatically instead
      of doing it in all callers. This can be made into a separate
      patch if people feel it's worth it.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48c83012
    • E
      tcp: Change possible SYN flooding messages · 946cedcc
      Eric Dumazet 提交于
      "Possible SYN flooding on port xxxx " messages can fill logs on servers.
      
      Change logic to log the message only once per listener, and add two new
      SNMP counters to track :
      
      TCPReqQFullDoCookies : number of times a SYNCOOKIE was replied to client
      
      TCPReqQFullDrop : number of times a SYN request was dropped because
      syncookies were not enabled.
      
      Based on a prior patch from Tom Herbert, and suggestions from David.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Tom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      946cedcc
  3. 15 9月, 2011 2 次提交
  4. 09 9月, 2011 1 次提交
  5. 06 9月, 2011 1 次提交
  6. 29 8月, 2011 1 次提交
    • S
      perf events: Fix slow and broken cgroup context switch code · a8d757ef
      Stephane Eranian 提交于
      The current cgroup context switch code was incorrect leading
      to bogus counts. Furthermore, as soon as there was an active
      cgroup event on a CPU, the context switch cost on that CPU
      would increase by a significant amount as demonstrated by a
      simple ping/pong example:
      
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10684.51 ctxsw/s
      
      Now start a cgroup perf stat:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
      $ ./pong
       Both processes pinned to CPU1, running for 10s
       6674.61 ctxsw/s
      
      That's a 37% penalty.
      
      Note that pong is not even in the monitored cgroup.
      
      The results shown by perf stat are bogus:
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100
      
       Performance counter stats for 'sleep 100':
      
       CPU1 <not counted> cycles   test
       CPU1 16,984,189,138 cycles  #    0.000 GHz
      
      The second 'cycles' event should report a count @ CPU clock
      (here 2.4GHz) as it is counting across all cgroups.
      
      The patch below fixes the bogus accounting and bypasses any
      cgroup switches in case the outgoing and incoming tasks are
      in the same cgroup.
      
      With this patch the same test now yields:
       $ ./pong
       Both processes pinned to CPU1, running for 10s
       10775.30 ctxsw/s
      
      Start perf stat with cgroup:
      
       $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
      Run pong outside the cgroup:
       $ /pong
       Both processes pinned to CPU1, running for 10s
       10687.80 ctxsw/s
      
      The penalty is now less than 2%.
      
      And the results for perf stat are correct:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 <not counted> cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
      Now perf stat reports the correct counts for
      for the non cgroup event.
      
      If we run pong inside the cgroup, then we also get the
      correct counts:
      
      $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10
      
       Performance counter stats for 'sleep 10':
      
       CPU1 22,297,726,205 cycles test #    0.000 GHz
       CPU1 23,933,981,448 cycles      #    0.000 GHz
      
            10.001457237 seconds time elapsed
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20110825135803.GA4697@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8d757ef
  7. 27 8月, 2011 1 次提交
  8. 26 8月, 2011 5 次提交
    • D
      backlight: add a callback 'notify_after' for backlight control · cc7993f6
      Dilan Lee 提交于
      We need a callback to do some things after pwm_enable, pwm_disable
      and pwm_config.
      Signed-off-by: NDilan Lee <dilee@nvidia.com>
      Reviewed-by: NRobert Morell <rmorell@nvidia.com>
      Reviewed-by: NArun Murthy <arun.murthy@stericsson.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc7993f6
    • A
      rapidio: fix use of non-compatible registers · 284fb68d
      Alexandre Bounine 提交于
      Replace/remove use of RIO v.1.2 registers/bits that are not
      forward-compatible with newer versions of RapidIO specification.
      
      RapidIO specification v.1.3 removed Write Port CSR, Doorbell CSR,
      Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
      
      Use of removed (since RIO v.1.3) register bits affects users of
      currently available 1.3 and 2.x compliant devices who may use not so
      recent kernel versions.
      
      Removing checks for unsupported bits makes corresponding routines
      compatible with all versions of RapidIO specification.  Therefore,
      backporting makes stable kernel versions compliant with RIO v.1.3 and
      later as well.
      Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Thomas Moll <thomas.moll@sysgo.com>
      Cc: Chul Kim <chul.kim@idt.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      284fb68d
    • E
      a8018766
    • J
      lockdep: Add helper function for dir vs file i_mutex annotation · e096d0c7
      Josh Boyer 提交于
      Purely in-memory filesystems do not use the inode hash as the dcache
      tells us if an entry already exists.  As a result, they do not call
      unlock_new_inode, and thus directory inodes do not get put into a
      different lockdep class for i_sem.
      
      We need the different lockdep classes, because the locking order for
      i_mutex is different for directory inodes and regular inodes.  Directory
      inodes can do "readdir()", which takes i_mutex *before* possibly taking
      mm->mmap_sem (due to a page fault while copying the directory entry to
      user space).
      
      In contrast, regular inodes can be mmap'ed, which takes mm->mmap_sem
      before accessing i_mutex.
      
      The two cases can never happen for the same inode, so no real deadlock
      can occur, but without the different lockdep classes, lockdep cannot
      understand that.  As a result, if CONFIG_DEBUG_LOCK_ALLOC is set, this
      can lead to false positives from lockdep like below:
      
          find/645 is trying to acquire lock:
           (&mm->mmap_sem){++++++}, at: [<ffffffff81109514>] might_fault+0x5c/0xac
      
          but task is already holding lock:
           (&sb->s_type->i_mutex_key#15){+.+.+.}, at: [<ffffffff81149f34>]
          vfs_readdir+0x5b/0xb4
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&sb->s_type->i_mutex_key#15){+.+.+.}:
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff814db822>] __mutex_lock_common+0x4c/0x361
                [<ffffffff814dbc46>] mutex_lock_nested+0x40/0x45
                [<ffffffff811daa87>] hugetlbfs_file_mmap+0x82/0x110
                [<ffffffff81111557>] mmap_region+0x258/0x432
                [<ffffffff811119dd>] do_mmap_pgoff+0x2ac/0x306
                [<ffffffff81111b4f>] sys_mmap_pgoff+0x118/0x16a
                [<ffffffff8100c858>] sys_mmap+0x22/0x24
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
          -> #0 (&mm->mmap_sem){++++++}:
                [<ffffffff8108a4bc>] __lock_acquire+0xa1a/0xcf7
                [<ffffffff8108ac26>] lock_acquire+0xbf/0x103
                [<ffffffff81109541>] might_fault+0x89/0xac
                [<ffffffff81149cff>] filldir+0x6f/0xc7
                [<ffffffff811586ea>] dcache_readdir+0x67/0x205
                [<ffffffff81149f54>] vfs_readdir+0x7b/0xb4
                [<ffffffff8114a073>] sys_getdents+0x7e/0xd1
                [<ffffffff814e3ec2>] system_call_fastpath+0x16/0x1b
      
      This patch moves the directory vs file lockdep annotation into a helper
      function that can be called by in-memory filesystems and has hugetlbfs
      call it.
      Signed-off-by: NJosh Boyer <jwboyer@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e096d0c7
    • A
      Add a personality to report 2.6.x version numbers · be27425d
      Andi Kleen 提交于
      I ran into a couple of programs which broke with the new Linux 3.0
      version.  Some of those were binary only.  I tried to use LD_PRELOAD to
      work around it, but it was quite difficult and in one case impossible
      because of a mix of 32bit and 64bit executables.
      
      For example, all kind of management software from HP doesnt work, unless
      we pretend to run a 2.6 kernel.
      
        $ uname -a
        Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux
      
        $ hpacucli ctrl all show
      
        Error: No controllers detected.
      
        $ rpm -qf /usr/sbin/hpacucli
        hpacucli-8.75-12.0
      
      Another notable case is that Python now reports "linux3" from
      sys.platform(); which in turn can break things that were checking
      sys.platform() == "linux2":
      
        https://bugzilla.mozilla.org/show_bug.cgi?id=664564
      
      It seems pretty clear to me though it's a bug in the apps that are using
      '==' instead of .startswith(), but this allows us to unbreak broken
      programs.
      
      This patch adds a UNAME26 personality that makes the kernel report a
      2.6.40+x version number instead.  The x is the x in 3.x.
      
      I know this is somewhat ugly, but I didn't find a better workaround, and
      compatibility to existing programs is important.
      
      Some programs also read /proc/sys/kernel/osrelease.  This can be worked
      around in user space with mount --bind (and a mount namespace)
      
      To use:
      
        wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
        gcc -o uname26 uname26.c
        ./uname26 program
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be27425d
  9. 24 8月, 2011 1 次提交
    • J
      TTY: pty, fix pty counting · 24d406a6
      Jiri Slaby 提交于
      tty_operations->remove is normally called like:
      queue_release_one_tty
       ->tty_shutdown
         ->tty_driver_remove_tty
           ->tty_operations->remove
      
      However tty_shutdown() is called from queue_release_one_tty() only if
      tty_operations->shutdown is NULL. But for pty, it is not.
      pty_unix98_shutdown() is used there as ->shutdown.
      
      So tty_operations->remove of pty (i.e. pty_unix98_remove()) is never
      called. This results in invalid pty_count. I.e. what can be seen in
      /proc/sys/kernel/pty/nr.
      
      I see this was already reported at:
        https://lkml.org/lkml/2009/11/5/370
      But it was not fixed since then.
      
      This patch is kind of a hackish way. The problem lies in ->install. We
      allocate there another tty (so-called tty->link). So ->install is
      called once, but ->remove twice, for both tty and tty->link. The fix
      here is to count both tty and tty->link and divide the count by 2 for
      user.
      
      And to have ->remove called, let's make tty_driver_remove_tty() global
      and call that from pty_unix98_shutdown() (tty_operations->shutdown).
      
      While at it, let's document that when ->shutdown is defined,
      tty_shutdown() is not called.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Alan Cox <alan@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: stable <stable@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      24d406a6
  10. 23 8月, 2011 1 次提交
    • P
      drivers:misc:ti-st: platform hooks for chip states · 0d7c5f25
      Pavan Savoy 提交于
      Certain platform specific or Host-WiLink Interface specific actions would be
      required to be taken when the chip is being enabled and after the chip is
      disabled such as configuration of the mux modes for the GPIO of host connected
      to the nshutdown of the chip or relinquishing UART after the chip is disabled.
      
      Similar actions can also be taken when the chip is in deep sleep or when the
      chip is awake. Performance enhancements such as configuring the host to run
      faster when chip is awake and slower when chip is asleep can also be made
      here.
      Signed-off-by: NPavan Savoy <pavan_savoy@ti.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0d7c5f25
  11. 19 8月, 2011 1 次提交
    • W
      squeeze max-pause area and drop pass-good area · bb082295
      Wu Fengguang 提交于
      Revert the pass-good area introduced in ffd1f609 ("writeback:
      introduce max-pause and pass-good dirty limits") and make the max-pause
      area smaller and safe.
      
      This fixes ~30% performance regression in the ext3 data=writeback
      fio_mmap_randwrite_64k/fio_mmap_randrw_64k test cases, where there are
      12 JBOD disks, on each disk runs 8 concurrent tasks doing reads+writes.
      
      Using deadline scheduler also has a regression, but not that big as CFQ,
      so this suggests we have some write starvation.
      
      The test logs show that
      
      - the disks are sometimes under utilized
      
      - global dirty pages sometimes rush high to the pass-good area for
        several hundred seconds, while in the mean time some bdi dirty pages
        drop to very low value (bdi_dirty << bdi_thresh).  Then suddenly the
        global dirty pages dropped under global dirty threshold and bdi_dirty
        rush very high (for example, 2 times higher than bdi_thresh). During
        which time balance_dirty_pages() is not called at all.
      
      So the problems are
      
      1) The random writes progress so slow that they break the assumption of
         the max-pause logic that "8 pages per 200ms is typically more than
         enough to curb heavy dirtiers".
      
      2) The max-pause logic ignored task_bdi_thresh and thus opens the possibility
         for some bdi's to over dirty pages, leading to (bdi_dirty >> bdi_thresh)
         and then (bdi_thresh >> bdi_dirty) for others.
      
      3) The higher max-pause/pass-good thresholds somehow leads to the bad
         swing of dirty pages.
      
      The fix is to allow the task to slightly dirty over task_bdi_thresh, but
      no way to exceed bdi_dirty and/or global dirty_thresh.
      
      Tests show that it fixed the JBOD regression completely (both behavior
      and performance), while still being able to cut down large pause times
      in balance_dirty_pages() for single-disk cases.
      Reported-by: NLi Shaohua <shaohua.li@intel.com>
      Tested-by: NLi Shaohua <shaohua.li@intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      bb082295
  12. 18 8月, 2011 2 次提交
  13. 16 8月, 2011 1 次提交
    • J
      block: fix flush machinery for stacking drivers with differring flush flags · 4853abaa
      Jeff Moyer 提交于
      Commit ae1b1539, block: reimplement
      FLUSH/FUA to support merge, introduced a performance regression when
      running any sort of fsyncing workload using dm-multipath and certain
      storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
      dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
      that dm-multipath always advertised flush+fua support, and passed
      commands on down the stack, where those flags used to get stripped off.
      The above commit changed that behavior:
      
      static inline struct request *__elv_next_request(struct request_queue *q)
      {
              struct request *rq;
      
              while (1) {
      -               while (!list_empty(&q->queue_head)) {
      +               if (!list_empty(&q->queue_head)) {
                              rq = list_entry_rq(q->queue_head.next);
      -                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
      -                           (rq->cmd_flags & REQ_FLUSH_SEQ))
      -                               return rq;
      -                       rq = blk_do_flush(q, rq);
      -                       if (rq)
      -                               return rq;
      +                       return rq;
                      }
      
      Note that previously, a command would come in here, have
      REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
      
      struct request *blk_do_flush(struct request_queue *q, struct request *rq)
      {
              unsigned int fflags = q->flush_flags; /* may change, cache it */
              bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
              bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
              bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
              REQ_FUA);
              unsigned skip = 0;
      ...
              if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                      rq->cmd_flags &= ~REQ_FLUSH;
      		if (!has_fua)
      			rq->cmd_flags &= ~REQ_FUA;
      	        return rq;
      	}
      
      So, the flush machinery was bypassed in such cases (q->flush_flags == 0
      && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
      
      Now, however, we don't get into the flush machinery at all.  Instead,
      __elv_next_request just hands a request with flush and fua bits set to
      the scsi_request_fn, even if the underlying request_queue does not
      support flush or fua.
      
      The agreed upon approach is to fix the flush machinery to allow
      stacking.  While this isn't used in practice (since there is only one
      request-based dm target, and that target will now reflect the flush
      flags of the underlying device), it does future-proof the solution, and
      make it function as designed.
      
      In order to make this work, I had to add a field to the struct request,
      inside the flush structure (to store the original req->end_io).  Shaohua
      had suggested overloading the union with rb_node and completion_data,
      but the completion data is used by device mapper and can also be used by
      other drivers.  So, I didn't see a way around the additional field.
      
      I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
      the lost performance.  Comments and other testers, as always, are
      appreciated.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4853abaa
  14. 14 8月, 2011 2 次提交
  15. 12 8月, 2011 1 次提交
    • V
      move RLIMIT_NPROC check from set_user() to do_execve_common() · 72fa5997
      Vasiliy Kulikov 提交于
      The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC
      check in set_user() to check for NPROC exceeding via setuid() and
      similar functions.
      
      Before the check there was a possibility to greatly exceed the allowed
      number of processes by an unprivileged user if the program relied on
      rlimit only.  But the check created new security threat: many poorly
      written programs simply don't check setuid() return code and believe it
      cannot fail if executed with root privileges.  So, the check is removed
      in this patch because of too often privilege escalations related to
      buggy programs.
      
      The NPROC can still be enforced in the common code flow of daemons
      spawning user processes.  Most of daemons do fork()+setuid()+execve().
      The check introduced in execve() (1) enforces the same limit as in
      setuid() and (2) doesn't create similar security issues.
      
      Neil Brown suggested to track what specific process has exceeded the
      limit by setting PF_NPROC_EXCEEDED process flag.  With the change only
      this process would fail on execve(), and other processes' execve()
      behaviour is not changed.
      
      Solar Designer suggested to re-check whether NPROC limit is still
      exceeded at the moment of execve().  If the process was sleeping for
      days between set*uid() and execve(), and the NPROC counter step down
      under the limit, the defered execve() failure because NPROC limit was
      exceeded days ago would be unexpected.  If the limit is not exceeded
      anymore, we clear the flag on successful calls to execve() and fork().
      
      The flag is also cleared on successful calls to set_user() as the limit
      was exceeded for the previous user, not the current one.
      
      Similar check was introduced in -ow patches (without the process flag).
      
      v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user().
      Reviewed-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72fa5997
  16. 11 8月, 2011 3 次提交
  17. 10 8月, 2011 1 次提交
  18. 09 8月, 2011 3 次提交
  19. 08 8月, 2011 3 次提交
    • M
      fuse: fix flock · 37fb3a30
      Miklos Szeredi 提交于
      Commit a9ff4f87 "fuse: support BSD locking semantics" overlooked a
      number of issues with supporing flock locks over existing POSIX
      locking infrastructure:
      
        - it's not backward compatible, passing flock(2) calls to userspace
          unconditionally (if userspace sets FUSE_POSIX_LOCKS)
      
        - it doesn't cater for the fact that flock locks are automatically
          unlocked on file release
      
        - it doesn't take into account the fact that flock exclusive locks
          (write locks) don't need an fd opened for write.
      
      The last one invalidates the original premise of the patch that flock
      locks can be emulated with POSIX locks.
      
      This patch fixes the first two issues.  The last one needs to be fixed
      in userspace if the filesystem assumed that a write lock will happen
      only on a file operned for write (as in the case of the current fuse
      library).
      Reported-by: NSebastian Pipping <webmaster@hartwork.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      37fb3a30
    • D
      net: Make userland include of netlink.h more sane. · 6602a4ba
      David S. Miller 提交于
      Currently userland will barf when including linux/netlink.h unless it
      precisely includes sys/socket.h first.  The issue is where the
      definition of "sa_family_t" comes from.
      
      We've been back and forth on how to fix this issue in the past, see:
      
      http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/622621
      http://thread.gmane.org/gmane.linux.network/143380
      
      Ben Hutchings suggested we take a hint from how we handle the
      sockaddr_storage type.  First we define a "__kernel_sa_family_t"
      to linux/socket.h that is always defined.
      
      Then if __KERNEL__ is defined, we also define "sa_family_t" as
      equal to "__kernel_sa_family_t".
      
      Then in places like linux/netlink.h we use __kernel_sa_family_t
      in user visible datastructures.
      Reported-by: NMichel Machado <michel@digirati.com.br>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6602a4ba
    • A
      fix rcu annotations noise in cred.h · 32955148
      Al Viro 提交于
      task->cred is declared as __rcu, and access to other tasks' ->cred is,
      indeed, protected.  Access to current->cred does not need rcu_dereference()
      at all, since only the task itself can change its ->cred.  sparse, of
      course, has no way of knowing that...
      
      Add force-cast in current_cred(), make current_fsuid() et.al. use it.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32955148
  20. 07 8月, 2011 5 次提交
    • L
      vfs: optimize inode cache access patterns · 3ddcd056
      Linus Torvalds 提交于
      The inode structure layout is largely random, and some of the vfs paths
      really do care.  The path lookup in particular is already quite D$
      intensive, and profiles show that accessing the 'inode->i_op->xyz'
      fields is quite costly.
      
      We already optimized the dcache to not unnecessarily load the d_op
      structure for members that are often NULL using the DCACHE_OP_xyz bits
      in dentry->d_flags, and this does something very similar for the inode
      ops that are used during pathname lookup.
      
      It also re-orders the fields so that the fields accessed by 'stat' are
      together at the beginning of the inode structure, and roughly in the
      order accessed.
      
      The effect of this seems to be in the 1-2% range for an empty kernel
      "make -j" run (which is fairly kernel-intensive, mostly in filename
      lookup), so it's visible.  The numbers are fairly noisy, though, and
      likely depend a lot on exact microarchitecture.  So there's more tuning
      to be done.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ddcd056
    • L
      vfs: renumber DCACHE_xyz flags, remove some stale ones · 830c0f0e
      Linus Torvalds 提交于
      Gcc tends to generate better code with small integers, including the
      DCACHE_xyz flag tests - so move the common ones to be first in the list.
      Also just remove the unused DCACHE_INOTIFY_PARENT_WATCHED and
      DCACHE_AUTOFS_PENDING values, their users no longer exists in the source
      tree.
      
      And add a "unlikely()" to the DCACHE_OP_COMPARE test, since we want the
      common case to be a nice straight-line fall-through.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      830c0f0e
    • D
      net: Compute protocol sequence numbers and fragment IDs using MD5. · 6e5714ea
      David S. Miller 提交于
      Computers have become a lot faster since we compromised on the
      partial MD4 hash which we use currently for performance reasons.
      
      MD5 is a much safer choice, and is inline with both RFC1948 and
      other ISS generators (OpenBSD, Solaris, etc.)
      
      Furthermore, only having 24-bits of the sequence number be truly
      unpredictable is a very serious limitation.  So the periodic
      regeneration and 8-bit counter have been removed.  We compute and
      use a full 32-bit sequence number.
      
      For ipv6, DCCP was found to use a 32-bit truncated initial sequence
      number (it needs 43-bits) and that is fixed here as well.
      Reported-by: NDan Kaminsky <dan@doxpara.com>
      Tested-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e5714ea
    • D
      crypto: Move md5_transform to lib/md5.c · bc0b96b5
      David S. Miller 提交于
      We are going to use this for TCP/IP sequence number and fragment ID
      generation.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc0b96b5
    • M
      lib/sha1: use the git implementation of SHA-1 · 1eb19a12
      Mandeep Singh Baines 提交于
      For ChromiumOS, we use SHA-1 to verify the integrity of the root
      filesystem.  The speed of the kernel sha-1 implementation has a major
      impact on our boot performance.
      
      To improve boot performance, we investigated using the heavily optimized
      sha-1 implementation used in git.  With the git sha-1 implementation, we
      see a 11.7% improvement in boot time.
      
      10 reboots, remove slowest/fastest.
      
      Before:
      
        Mean: 6.58 seconds Stdev: 0.14
      
      After (with git sha-1, this patch):
      
        Mean: 5.89 seconds Stdev: 0.07
      
      The other cool thing about the git SHA-1 implementation is that it only
      needs 64 bytes of stack for the workspace while the original kernel
      implementation needed 320 bytes.
      Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
      Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
      Cc: Nicolas Pitre <nico@cam.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-crypto@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eb19a12
  21. 06 8月, 2011 1 次提交
  22. 05 8月, 2011 1 次提交