提交 5d180fdf 编写于 作者: R Rafael Mendonça França

Merge pull request #12257 from vipulnsward/end_on_find_in_batches

Add an option `end` to `find_in_batches`
* `find_in_batches` now accepts an `:end_at` parameter that complements the `:start`
parameter to specify where to stop batch processing.
*Vipul A M*
* Fix rounding problem for PostgreSQL timestamp column.
If timestamp column have the precision, it need to format according to
......
......@@ -27,11 +27,12 @@ module Batches
#
# ==== Options
# * <tt>:batch_size</tt> - Specifies the size of the batch. Default to 1000.
# * <tt>:start</tt> - Specifies the primary key value to start from.
# * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:end_at</tt> - Specifies the primary key value to end at, inclusive of the value.
# This is especially useful if you want multiple workers dealing with
# the same processing queue. You can make worker 1 handle all the records
# between id 0 and 10,000 and worker 2 handle from 10,000 and beyond
# (by setting the +:start+ option on that worker).
# (by setting the +:start+ and +:end_at+ option on each worker).
#
# # Let's process for a batch of 2000 records, skipping the first 2000 rows
# Person.find_each(start: 2000, batch_size: 2000) do |person|
......@@ -45,14 +46,15 @@ module Batches
#
# NOTE: You can't set the limit either, that's used to control
# the batch sizes.
def find_each(start: nil, batch_size: 1000)
def find_each(start: nil, end_at: nil, batch_size: 1000)
if block_given?
find_in_batches(start: start, batch_size: batch_size) do |records|
find_in_batches(start: start, end_at: end_at, batch_size: batch_size) do |records|
records.each { |record| yield record }
end
else
enum_for(:find_each, start: start, batch_size: batch_size) do
start ? where(table[primary_key].gteq(start)).size : size
enum_for(:find_each, start: start, end_at: end_at, batch_size: batch_size) do
relation = self
apply_limits(relation, start, end_at).size
end
end
end
......@@ -77,11 +79,12 @@ def find_each(start: nil, batch_size: 1000)
#
# ==== Options
# * <tt>:batch_size</tt> - Specifies the size of the batch. Default to 1000.
# * <tt>:start</tt> - Specifies the primary key value to start from.
# * <tt>:start</tt> - Specifies the primary key value to start from, inclusive of the value.
# * <tt>:end_at</tt> - Specifies the primary key value to end at, inclusive of the value.
# This is especially useful if you want multiple workers dealing with
# the same processing queue. You can make worker 1 handle all the records
# between id 0 and 10,000 and worker 2 handle from 10,000 and beyond
# (by setting the +:start+ option on that worker).
# (by setting the +:start+ and +:end_at+ option on each worker).
#
# # Let's process the next 2000 records
# Person.find_in_batches(start: 2000, batch_size: 2000) do |group|
......@@ -95,12 +98,12 @@ def find_each(start: nil, batch_size: 1000)
#
# NOTE: You can't set the limit either, that's used to control
# the batch sizes.
def find_in_batches(start: nil, batch_size: 1000)
def find_in_batches(start: nil, end_at: nil, batch_size: 1000)
relation = self
unless block_given?
return to_enum(:find_in_batches, start: start, batch_size: batch_size) do
total = start ? where(table[primary_key].gteq(start)).size : size
return to_enum(:find_in_batches, start: start, end_at: end_at, batch_size: batch_size) do
total = apply_limits(relation, start, end_at).size
(total - 1).div(batch_size) + 1
end
end
......@@ -110,7 +113,8 @@ def find_in_batches(start: nil, batch_size: 1000)
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)).to_a : relation.to_a
relation = apply_limits(relation, start, end_at)
records = relation.to_a
while records.any?
records_size = records.size
......@@ -127,6 +131,12 @@ def find_in_batches(start: nil, batch_size: 1000)
private
def apply_limits(relation, start, end_at)
relation = relation.where(table[primary_key].gteq(start)) if start
relation = relation.where(table[primary_key].lteq(end_at)) if end_at
relation
end
def batch_order
"#{quoted_table_name}.#{quoted_primary_key} ASC"
end
......
......@@ -106,6 +106,15 @@ def test_find_in_batches_should_start_from_the_start_option
end
end
def test_find_in_batches_should_end_at_the_end_option
assert_queries(6) do
Post.find_in_batches(batch_size: 1, end_at: 5) do |batch|
assert_kind_of Array, batch
assert_kind_of Post, batch.first
end
end
end
def test_find_in_batches_shouldnt_execute_query_unless_needed
assert_queries(2) do
Post.find_in_batches(:batch_size => @total) {|batch| assert_kind_of Array, batch }
......
......@@ -343,6 +343,19 @@ end
Another example would be if you wanted multiple workers handling the same processing queue. You could have each worker handle 10000 records by setting the appropriate `:start` option on each worker.
**`:end_at`**
Similar to the `:start` option, `:end_at` allows you to configure the last ID of the sequence whenever the highest ID is not the one you need.
This would be useful, for example, if you wanted to run a batch process, using a subset of records based on `:start` and `:end_at`
For example, to send newsletters only to users with the primary key starting from 2000 upto 10000 and to retrieve them in batches of 1000:
```ruby
User.find_each(start: 2000, end_at: 10000, batch_size: 5000) do |user|
NewsMailer.weekly(user).deliver_now
end
```
#### `find_in_batches`
The `find_in_batches` method is similar to `find_each`, since both retrieve batches of records. The difference is that `find_in_batches` yields _batches_ to the block as an array of models, instead of individually. The following example will yield to the supplied block an array of up to 1000 invoices at a time, with the final block containing any remaining invoices:
......@@ -356,7 +369,7 @@ end
##### Options for `find_in_batches`
The `find_in_batches` method accepts the same `:batch_size` and `:start` options as `find_each`.
The `find_in_batches` method accepts the same `:batch_size`, `:start` and `:end_at` options as `find_each`.
Conditions
----------
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册