提交 54243fec 编写于 作者: S schneems

Speed up String#blank? Regex

Follow up on https://github.com/rails/rails/commit/697384df36a939e565b7c08725017d49dc83fe40#commitcomment-17184696.

The regex to detect a blank string `/\A[[:space:]]*\z/` will loop through every character in the string to ensure that all of them are a `:space:` type. We can invert this logic and instead look for any non-`:space:` characters. When that happens, we would return on the first character found and the regex engine does not need to keep looking.

Thanks @nellshamrell for the regex talk at LSRC.

By defining a "blank" string as any string that does not have a non-whitespace character (yes, double negative) we can get a substantial speed bump.

Also an inline regex is (barely) faster than a regex in a constant, since it skips the constant lookup. A regex literal is frozen by default.

```ruby
require 'benchmark/ips'

def string_generate
  str = " abcdefghijklmnopqrstuvwxyz\t".freeze
  str[rand(0..(str.length - 1))] * rand(0..23)
end

strings = 100.times.map { string_generate }

ALL_WHITESPACE_STAR = /\A[[:space:]]*\z/

Benchmark.ips do |x|
  x.report('current regex            ') { strings.each {|str| str.empty? || ALL_WHITESPACE_STAR === str } }
  x.report('+ instead of *           ') { strings.each {|str| str.empty? || /\A[[:space:]]+\z/ === str } }
  x.report('not a non-whitespace char') { strings.each {|str| str.empty? || !(/[[:^space:]]/ === str) } }
  x.compare!
end

# Warming up --------------------------------------
# current regex
#                          1.744k i/100ms
# not a non-whitespace char
#                          2.264k i/100ms
# Calculating -------------------------------------
# current regex
#                          18.078k (± 8.9%) i/s -     90.688k
# not a non-whitespace char
#                          23.580k (± 7.1%) i/s -    117.728k

# Comparison:
# not a non-whitespace char:    23580.3 i/s
# current regex            :    18078.2 i/s - 1.30x slower
```

This makes the method roughly 30% faster `(23.580 - 18.078)/18.078 * 100`.

cc/ @fxn
上级 697384df
......@@ -112,12 +112,9 @@ class String
#
# @return [true, false]
def blank?
# In practice, the majority of blank strings are empty. As of this writing
# checking for empty? is about 3.5x faster than matching against the regexp
# in MRI, so we call the predicate first, and then fallback.
#
# The penalty for blank strings with whitespace or present ones is marginal.
empty? || BLANK_RE === self
# Regex check is slow, only check non-empty strings.
# A string not blank if it contains a single non-space string.
empty? || !(/[[:^space:]]/ === self)
end
end
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册