1. 21 6月, 2009 1 次提交
  2. 19 6月, 2009 1 次提交
  3. 02 6月, 2009 2 次提交
    • H
      crypto: aes-ni - Add support for more modes · 2cf4ac8b
      Huang Ying 提交于
      Because kernel_fpu_begin() and kernel_fpu_end() operations are too
      slow, the performance gain of general mode implementation + aes-aesni
      is almost all compensated.
      
      The AES-NI support for more modes are implemented as follow:
      
      - Add a new AES algorithm implementation named __aes-aesni without
        kernel_fpu_begin/end()
      
      - Use fpu(<mode>(AES)) to provide kenrel_fpu_begin/end() invoking
      
      - Add <mode>(AES) ablkcipher, which uses cryptd(fpu(<mode>(AES))) to
        defer cryption to cryptd context in soft_irq context.
      
      Now the ctr, lrw, pcbc and xts support are added.
      
      Performance testing based on dm-crypt shows that cryption time can be
      reduced to 50% of general mode implementation + aes-aesni implementation.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2cf4ac8b
    • H
      crypto: fpu - Add template for blkcipher touching FPU · 150c7e85
      Huang Ying 提交于
      Blkcipher touching FPU need to be enclosed by kernel_fpu_begin() and
      kernel_fpu_end(). If they are invoked in cipher algorithm
      implementation, they will be invoked for each block, so that
      performance will be hurt, because they are "slow" operations. This
      patch implements "fpu" template, which makes these operations to be
      invoked for each request.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      150c7e85
  4. 04 3月, 2009 3 次提交
  5. 19 2月, 2009 3 次提交
    • H
      crypto: chainiv - Use kcrypto_wq instead of keventd_wq · 0a2e821d
      Huang Ying 提交于
      keventd_wq has potential starvation problem, so use dedicated
      kcrypto_wq instead.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      0a2e821d
    • H
      crypto: cryptd - Per-CPU thread implementation based on kcrypto_wq · 254eff77
      Huang Ying 提交于
      Original cryptd thread implementation has scalability issue, this
      patch solve the issue with a per-CPU thread implementation.
      
      struct cryptd_queue is defined to be a per-CPU queue, which holds one
      struct cryptd_cpu_queue for each CPU. In struct cryptd_cpu_queue, a
      struct crypto_queue holds all requests for the CPU, a struct
      work_struct is used to run all requests for the CPU.
      
      Testing based on dm-crypt on an Intel Core 2 E6400 (two cores) machine
      shows 19.2% performance gain. The testing script is as follow:
      
      -------------------- script begin ---------------------------
      #!/bin/sh
      
      dmc_create()
      {
              # Create a crypt device using dmsetup
              dmsetup create $2 --table "0 `blockdev --getsize $1` crypt cbc(aes-asm)?cryptd?plain:plain babebabebabebabebabebabebabebabe 0 $1 0"
      }
      
      dmsetup remove crypt0
      dmsetup remove crypt1
      
      dd if=/dev/zero of=/dev/ram0 bs=1M count=4 >& /dev/null
      dd if=/dev/zero of=/dev/ram1 bs=1M count=4 >& /dev/null
      
      dmc_create /dev/ram0 crypt0
      dmc_create /dev/ram1 crypt1
      
      cat >tr.sh <<EOF
      #!/bin/sh
      
      for n in \$(seq 10); do
              dd if=/dev/dm-0 of=/dev/null >& /dev/null &
              dd if=/dev/dm-1 of=/dev/null >& /dev/null &
      done
      wait
      EOF
      
      for n in $(seq 10); do
              /usr/bin/time sh tr.sh
      done
      rm tr.sh
      -------------------- script end   ---------------------------
      
      The separator of dm-crypt parameter is changed from "-" to "?", because
      "-" is used in some cipher driver name too, and cryptds need to specify
      cipher driver name instead of cipher name.
      
      The test result on an Intel Core2 E6400 (two cores) is as follow:
      
      without patch:
      -----------------wo begin --------------------------
      0.04user 0.38system 0:00.39elapsed 107%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6566minor)pagefaults 0swaps
      0.07user 0.35system 0:00.35elapsed 121%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6567minor)pagefaults 0swaps
      0.06user 0.34system 0:00.30elapsed 135%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6562minor)pagefaults 0swaps
      0.05user 0.37system 0:00.36elapsed 119%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6607minor)pagefaults 0swaps
      0.06user 0.36system 0:00.35elapsed 120%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6562minor)pagefaults 0swaps
      0.05user 0.37system 0:00.31elapsed 136%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6594minor)pagefaults 0swaps
      0.04user 0.34system 0:00.30elapsed 126%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6597minor)pagefaults 0swaps
      0.06user 0.32system 0:00.31elapsed 125%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6571minor)pagefaults 0swaps
      0.06user 0.34system 0:00.31elapsed 134%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6581minor)pagefaults 0swaps
      0.05user 0.38system 0:00.31elapsed 138%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6600minor)pagefaults 0swaps
      -----------------wo end   --------------------------
      
      
      with patch:
      ------------------w begin --------------------------
      0.02user 0.31system 0:00.24elapsed 141%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6554minor)pagefaults 0swaps
      0.05user 0.34system 0:00.31elapsed 127%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6606minor)pagefaults 0swaps
      0.07user 0.33system 0:00.26elapsed 155%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6559minor)pagefaults 0swaps
      0.07user 0.32system 0:00.26elapsed 151%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6562minor)pagefaults 0swaps
      0.05user 0.34system 0:00.26elapsed 150%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6603minor)pagefaults 0swaps
      0.03user 0.36system 0:00.31elapsed 124%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6562minor)pagefaults 0swaps
      0.04user 0.35system 0:00.26elapsed 147%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6586minor)pagefaults 0swaps
      0.03user 0.37system 0:00.27elapsed 146%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6562minor)pagefaults 0swaps
      0.04user 0.36system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6594minor)pagefaults 0swaps
      0.04user 0.35system 0:00.26elapsed 154%CPU (0avgtext+0avgdata 0maxresident)k
      0inputs+0outputs (0major+6557minor)pagefaults 0swaps
      ------------------w end   --------------------------
      
      The middle value of elapsed time is:
      wo cryptwq: 0.31
      w  cryptwq: 0.26
      
      The performance gain is about (0.31-0.26)/0.26 = 0.192.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      254eff77
    • H
      crypto: api - Use dedicated workqueue for crypto subsystem · 25c38d3f
      Huang Ying 提交于
      Use dedicated workqueue for crypto subsystem
      
      A dedicated workqueue named kcrypto_wq is created to be used by crypto
      subsystem. The system shared keventd_wq is not suitable for
      encryption/decryption, because of potential starvation problem.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      25c38d3f
  6. 18 2月, 2009 1 次提交
  7. 25 12月, 2008 14 次提交
  8. 10 12月, 2008 1 次提交
  9. 29 8月, 2008 6 次提交
  10. 15 7月, 2008 1 次提交
  11. 10 7月, 2008 5 次提交
  12. 21 4月, 2008 2 次提交