提交 298c0092 编写于 作者: F Fabian Hueske

[FLINK-3649] [docs] Add documentation for DataSet minBy / maxBy.

This closes #2104
上级 7cc69434
......@@ -213,7 +213,7 @@ val naturalNumbers = intNumbers.filter { _ > 0 }
**IMPORTANT:** The system assumes that the function does not modify the elements on which the predicate is applied. Violating this assumption
can lead to incorrect results.
### Project (Tuple DataSets only) (Java/Python API Only)
### Projection of Tuple DataSet
The Project transformation removes or moves Tuple fields of a Tuple DataSet.
The `project(int...)` method selects Tuple fields that should be retained by their index and defines their order in the output Tuple.
......@@ -884,6 +884,42 @@ In contrast to that `.aggregate(SUM, 0).aggregate(MIN, 2)` will apply an aggrega
**Note:** The set of aggregation functions will be extended in the future.
### MinBy / MaxBy on Grouped Tuple DataSet
The MinBy (MaxBy) transformation selects a single tuple for each group of tuples. The selected tuple is the tuple whose values of one or more specified fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) fields values, an arbitrary tuple of these tuples is returned.
The following code shows how to select the tuple with the minimum values for the `Integer` and `Double` fields for each group of tuples with the same `String` value from a `DataSet<Tuple3<Integer, String, Double>>`:
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
~~~java
DataSet<Tuple3<Integer, String, Double>> input = // [...]
DataSet<Tuple3<Integer, String, Double>> output = input
.groupBy(1) // group DataSet on second field
.minBy(0, 2); // select tuple with minimum values for first and third field.
~~~
</div>
<div data-lang="scala" markdown="1">
~~~scala
val input: DataSet[(Int, String, Double)] = // [...]
val output: DataSet[(Int, String, Double)] = input
.groupBy(1) // group DataSet on second field
.minBy(0, 2) // select tuple with minimum values for first and third field.
~~~
</div>
<div data-lang="python" markdown="1">
~~~python
Not supported.
~~~
</div>
</div>
### Reduce on full DataSet
The Reduce transformation applies a user-defined reduce function to all elements of a DataSet.
......@@ -1018,6 +1054,40 @@ Not supported.
**Note:** Extending the set of supported aggregation functions is on our roadmap.
### MinBy / MaxBy on full Tuple DataSet
The MinBy (MaxBy) transformation selects a single tuple from a DataSet of tuples. The selected tuple is the tuple whose values of one or more specified fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) fields values, an arbitrary tuple of these tuples is returned.
The following code shows how to select the tuple with the maximum values for the `Integer` and `Double` fields from a `DataSet<Tuple3<Integer, String, Double>>`:
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
~~~java
DataSet<Tuple3<Integer, String, Double>> input = // [...]
DataSet<Tuple3<Integer, String, Double>> output = input
.maxBy(0, 2); // select tuple with maximum values for first and third field.
~~~
</div>
<div data-lang="scala" markdown="1">
~~~scala
val input: DataSet[(Int, String, Double)] = // [...]
val output: DataSet[(Int, String, Double)] = input
.maxBy(0, 2) // select tuple with maximum values for first and third field.
~~~
</div>
<div data-lang="python" markdown="1">
~~~python
Not supported.
~~~
</div>
</div>
### Distinct
The Distinct transformation computes the DataSet of the distinct elements of the source DataSet.
......
......@@ -460,6 +460,20 @@ The following transformations are available on data sets of Tuples:
{% highlight java %}
DataSet<Tuple3<Integer, Double, String>> in = // [...]
DataSet<Tuple2<String, Integer>> out = in.project(2,0);
{% endhighlight %}
</td>
</tr>
<tr>
<td><strong>MinBy / MaxBy</strong></td>
<td>
<p>Selects a tuple from a group of tuples whose values of one or more fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) field values, an arbitrary tuple of these tuples is returned. MinBy (MaxBy) may be applied on a full data set or a grouped data set.</p>
{% highlight java %}
DataSet<Tuple3<Integer, Double, String>> in = // [...]
// a DataSet with a single tuple with minimum values for the Integer and String fields.
DataSet<Tuple3<Integer, Double, String>> out = in.minBy(0, 2);
// a DataSet with one tuple for each group with the minimum value for the Double field.
DataSet<Tuple3<Integer, Double, String>> out2 = in.groupBy(2)
.minBy(1);
{% endhighlight %}
</td>
</tr>
......@@ -728,6 +742,35 @@ val result3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3)
</tbody>
</table>
----------
The following transformations are available on data sets of Tuples:
<table class="table table-bordered">
<thead>
<tr>
<th class="text-left" style="width: 20%">Transformation</th>
<th class="text-center">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>MinBy / MaxBy</strong></td>
<td>
<p>Selects a tuple from a group of tuples whose values of one or more fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) field values, an arbitrary tuple of these tuples is returned. MinBy (MaxBy) may be applied on a full data set or a grouped data set.</p>
{% highlight java %}
val in: DataSet[(Int, Double, String)] = // [...]
// a data set with a single tuple with minimum values for the Int and String fields.
val out: DataSet[(Int, Double, String)] = in.minBy(0, 2)
// a data set with one tuple for each group with the minimum value for the Double field.
val out2: DataSet[(Int, Double, String)] = in.groupBy(2)
.minBy(1)
{% endhighlight %}
</td>
</tr>
</tbody>
</table>
Extraction from tuples, case classes and collections via anonymous pattern matching, like the following:
{% highlight scala %}
val data: DataSet[(Int, String, Double)] = // [...]
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册