**IMPORTANT:** The system assumes that the function does not modify the elements on which the predicate is applied. Violating this assumption
can lead to incorrect results.
### Project (Tuple DataSets only) (Java/Python API Only)
### Projection of Tuple DataSet
The Project transformation removes or moves Tuple fields of a Tuple DataSet.
The `project(int...)` method selects Tuple fields that should be retained by their index and defines their order in the output Tuple.
...
...
@@ -884,6 +884,42 @@ In contrast to that `.aggregate(SUM, 0).aggregate(MIN, 2)` will apply an aggrega
**Note:** The set of aggregation functions will be extended in the future.
### MinBy / MaxBy on Grouped Tuple DataSet
The MinBy (MaxBy) transformation selects a single tuple for each group of tuples. The selected tuple is the tuple whose values of one or more specified fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) fields values, an arbitrary tuple of these tuples is returned.
The following code shows how to select the tuple with the minimum values for the `Integer` and `Double` fields for each group of tuples with the same `String` value from a `DataSet<Tuple3<Integer, String, Double>>`:
.minBy(0,2);// select tuple with minimum values for first and third field.
~~~
</div>
<divdata-lang="scala"markdown="1">
~~~scala
valinput:DataSet[(Int, String, Double)]=// [...]
valoutput:DataSet[(Int, String, Double)]=input
.groupBy(1)// group DataSet on second field
.minBy(0,2)// select tuple with minimum values for first and third field.
~~~
</div>
<divdata-lang="python"markdown="1">
~~~python
Notsupported.
~~~
</div>
</div>
### Reduce on full DataSet
The Reduce transformation applies a user-defined reduce function to all elements of a DataSet.
...
...
@@ -1018,6 +1054,40 @@ Not supported.
**Note:** Extending the set of supported aggregation functions is on our roadmap.
### MinBy / MaxBy on full Tuple DataSet
The MinBy (MaxBy) transformation selects a single tuple from a DataSet of tuples. The selected tuple is the tuple whose values of one or more specified fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) fields values, an arbitrary tuple of these tuples is returned.
The following code shows how to select the tuple with the maximum values for the `Integer` and `Double` fields from a `DataSet<Tuple3<Integer, String, Double>>`:
@@ -460,6 +460,20 @@ The following transformations are available on data sets of Tuples:
{% highlight java %}
DataSet<Tuple3<Integer,Double,String>> in = // [...]
DataSet<Tuple2<String,Integer>> out = in.project(2,0);
{% endhighlight %}
</td>
</tr>
<tr>
<td><strong>MinBy / MaxBy</strong></td>
<td>
<p>Selects a tuple from a group of tuples whose values of one or more fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) field values, an arbitrary tuple of these tuples is returned. MinBy (MaxBy) may be applied on a full data set or a grouped data set.</p>
{% highlight java %}
DataSet<Tuple3<Integer,Double,String>> in = // [...]
// a DataSet with a single tuple with minimum values for the Integer and String fields.
DataSet<Tuple3<Integer,Double,String>> out = in.minBy(0, 2);
// a DataSet with one tuple for each group with the minimum value for the Double field.
<p>Selects a tuple from a group of tuples whose values of one or more fields are minimum (maximum). The fields which are used for comparison must be valid key fields, i.e., comparable. If multiple tuples have minimum (maximum) field values, an arbitrary tuple of these tuples is returned. MinBy (MaxBy) may be applied on a full data set or a grouped data set.</p>
{% highlight java %}
val in: DataSet[(Int, Double, String)] = // [...]
// a data set with a single tuple with minimum values for the Int and String fields.
val out: DataSet[(Int, Double, String)] = in.minBy(0, 2)
// a data set with one tuple for each group with the minimum value for the Double field.
val out2: DataSet[(Int, Double, String)] = in.groupBy(2)
.minBy(1)
{% endhighlight %}
</td>
</tr>
</tbody>
</table>
Extraction from tuples, case classes and collections via anonymous pattern matching, like the following:
{% highlight scala %}
val data: DataSet[(Int, String, Double)] = // [...]