diff --git a/docs/dataset_transformations.md b/docs/dataset_transformations.md index ec038a7ca0876c966be4cec64143e969635f2f4b..f968b1ded588e5c89873274604f185e740e70b98 100644 --- a/docs/dataset_transformations.md +++ b/docs/dataset_transformations.md @@ -284,7 +284,7 @@ When using Case Classes you can also specify the grouping key using the names of ~~~scala case class MyClass(val a: String, b: Int, c: Double) -val tuples = DataSet[MyClass]] = // [...] +val tuples = DataSet[MyClass] = // [...] // group on the first and second field val reducedTuples = tuples.groupBy("a", "b").reduce { ... } ~~~ @@ -1103,15 +1103,11 @@ val unioned = vals1.union(vals2).union(vals3) -### Rebalance (Java API Only) - +### Rebalance Evenly rebalances the parallel partitions of a DataSet to eliminate data skew. -Only Map-like transformations may follow a rebalance transformation, i.e., -- Map -- FlatMap -- Filter -- MapPartition +
+
~~~java DataSet in = // [...] @@ -1120,16 +1116,26 @@ DataSet> out = in.rebalance() .map(new Mapper()); ~~~ -### Hash-Partition (Java API Only) +
+
+ +~~~scala +val in: DataSet[String] = // [...] +// rebalance DataSet and apply a Map transformation. +val out = in.rebalance().map { ... } +~~~ + +
+
+ + +### Hash-Partition Hash-partitions a DataSet on a given key. Keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys). -Only Map-like transformations may follow a hash-partition transformation, i.e., -- Map -- FlatMap -- Filter -- MapPartition +
+
~~~java DataSet> in = // [...] @@ -1138,10 +1144,25 @@ DataSet> out = in.partitionByHash(0) .mapPartition(new PartitionMapper()); ~~~ -### First-n (Java API Only) +
+
+ +~~~scala +val in: DataSet[(String, Int)] = // [...] +// hash-partition DataSet by String value and apply a MapPartition transformation. +val out = in.partitionByHash(0).mapPartition { ... } +~~~ + +
+
+ +### First-n Returns the first n (arbitrary) elements of a DataSet. First-n can be applied on a regular DataSet, a grouped DataSet, or a grouped-sorted DataSet. Grouping keys can be specified as key-selector functions or field position keys (see [Reduce examples](#reduce-on-grouped-dataset) for how to specify keys). +
+
+ ~~~java DataSet> in = // [...] // Return the first five (arbitrary) elements of the DataSet @@ -1155,4 +1176,22 @@ DataSet> out2 = in.groupBy(0) DataSet> out3 = in.groupBy(0) .sortGroup(1, Order.ASCENDING) .first(3); -~~~ \ No newline at end of file +~~~ + +
+
+ +~~~scala +val in: DataSet[(String, Int)] = // [...] +// Return the first five (arbitrary) elements of the DataSet +val out1 = in.first(5) + +// Return the first two (arbitrary) elements of each String group +val out2 = in.groupBy(0).first(2) + +// Return the first three elements of each String group ordered by the Integer field +val out3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3) +~~~ + +
+
\ No newline at end of file diff --git a/docs/programming_guide.md b/docs/programming_guide.md index 6e174ac4d4ec3ac98cea66f7d2834f1f91a1b42e..8d0b299ef43dc11bb18a6247673abf0505224883 100644 --- a/docs/programming_guide.md +++ b/docs/programming_guide.md @@ -608,7 +608,7 @@ DataSet result = in.rebalance() Hash-Partition -

Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys. Only Map-like transformations may follow a hash-partition transformation. (Java API Only)

+

Hash-partitions a data set on a given key. Keys can be specified as key-selector functions or field position keys.

{% highlight java %} DataSet> in = // [...] DataSet result = in.partitionByHash(0) @@ -804,6 +804,33 @@ val result: DataSet[(Int, String)] = data1.cross(data2)

Produces the union of two data sets.

{% highlight scala %} data.union(data2) +{% endhighlight %} + + + + Hash-Partition + +

Hash-partitions a data set on a given key. Keys can be specified as key-selector functions, tuple positions + or case class fields.

+{% highlight scala %} +val in: DataSet[(Int, String)] = // [...] +val result = in.partitionByHash(0).mapPartition { ... } +{% endhighlight %} + + + + First-n + +

Returns the first n (arbitrary) elements of a data set. First-n can be applied on a regular data set, a grouped data set, or a grouped-sorted data set. Grouping keys can be specified as key-selector functions, + tuple positions or case class fields.

+{% highlight scala %} +val in: DataSet[(Int, String)] = // [...] +// regular data set +val result1 = in.first(3) +// grouped data set +val result2 = in.groupBy(0).first(3) +// grouped-sorted data set +val result3 = in.groupBy(0).sortGroup(1, Order.ASCENDING).first(3) {% endhighlight %}