未验证 提交 2294cdda 编写于 作者: Z zentol 提交者: Till Rohrmann

[FLINK-10509][storm] Remove flink-storm

This closes #7453.
上级 a07ce7f6
......@@ -27,7 +27,7 @@ Learn more about Flink at [http://flink.apache.org/](http://flink.apache.org/)
* Custom memory management for efficient and robust switching between in-memory and out-of-core data processing algorithms
* Compatibility layers for Apache Hadoop MapReduce and Apache Storm
* Compatibility layers for Apache Hadoop MapReduce
* Integration with YARN, HDFS, HBase, and other components of the Apache Hadoop ecosystem
......
---
title: "Storm Compatibility"
is_beta: true
nav-parent_id: libs
nav-pos: 2
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
[Flink streaming]({{ site.baseurl }}/dev/datastream_api.html) is compatible with Apache Storm interfaces and therefore allows
reusing code that was implemented for Storm.
You can use Storm `Spout`/`Bolt` as source/operator in Flink streaming programs.
This document shows how to use existing Storm code with Flink.
* This will be replaced by the TOC
{:toc}
# Project Configuration
Support for Storm is contained in the `flink-storm` Maven module.
The code resides in the `org.apache.flink.storm` package.
Add the following dependency to your `pom.xml` if you want to execute Storm code in Flink.
{% highlight xml %}
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-storm{{ site.scala_version_suffix }}</artifactId>
<version>{{site.version}}</version>
</dependency>
{% endhighlight %}
**Please note**: Do not add `storm-core` as a dependency. It is already included via `flink-storm`.
**Please note**: `flink-storm` is not part of the provided binary Flink distribution.
Thus, you need to include `flink-storm` classes (and their dependencies) in your program jar (also called uber-jar or fat-jar) that is submitted to Flink's JobManager.
See *WordCount Storm* within `flink-storm-examples/pom.xml` for an example how to package a jar correctly.
If you want to avoid large uber-jars, you can manually copy `storm-core-0.9.4.jar`, `json-simple-1.1.jar` and `flink-storm-{{site.version}}.jar` into Flink's `lib/` folder of each cluster node (*before* the cluster is started).
For this case, it is sufficient to include only your own Spout and Bolt classes (and their internal dependencies) into the program jar.
# Embed Storm Operators in Flink Streaming Programs
Spouts and Bolts can be embedded into regular streaming programs.
The Storm compatibility layer offers a wrapper classes for each, namely `SpoutWrapper` and `BoltWrapper` (`org.apache.flink.storm.wrappers`).
Per default, both wrappers convert Storm output tuples to Flink's [Tuple]({{site.baseurl}}/dev/api_concepts.html#tuples-and-case-classes) types (ie, `Tuple0` to `Tuple25` according to the number of fields of the Storm tuples).
For single field output tuples a conversion to the field's data type is also possible (eg, `String` instead of `Tuple1<String>`).
Because Flink cannot infer the output field types of Storm operators, it is required to specify the output type manually.
In order to get the correct `TypeInformation` object, Flink's `TypeExtractor` can be used.
## Embed Spouts
In order to use a Spout as Flink source, use `StreamExecutionEnvironment.addSource(SourceFunction, TypeInformation)`.
The Spout object is handed to the constructor of `SpoutWrapper<OUT>` that serves as first argument to `addSource(...)`.
The generic type declaration `OUT` specifies the type of the source output stream.
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// stream has `raw` type (single field output streams only)
DataStream<String> rawInput = env.addSource(
new SpoutWrapper<String>(new FileSpout(localFilePath), new String[] { Utils.DEFAULT_STREAM_ID }), // emit default output stream as raw type
TypeExtractor.getForClass(String.class)); // output type
// process data stream
[...]
{% endhighlight %}
</div>
</div>
If a Spout emits a finite number of tuples, `SpoutWrapper` can be configures to terminate automatically by setting `numberOfInvocations` parameter in its constructor.
This allows the Flink program to shut down automatically after all data is processed.
Per default the program will run until it is [canceled]({{site.baseurl}}/ops/cli.html) manually.
## Embed Bolts
In order to use a Bolt as Flink operator, use `DataStream.transform(String, TypeInformation, OneInputStreamOperator)`.
The Bolt object is handed to the constructor of `BoltWrapper<IN,OUT>` that serves as last argument to `transform(...)`.
The generic type declarations `IN` and `OUT` specify the type of the operator's input and output stream, respectively.
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> text = env.readTextFile(localFilePath);
DataStream<Tuple2<String, Integer>> counts = text.transform(
"tokenizer", // operator name
TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)), // output type
new BoltWrapper<String, Tuple2<String, Integer>>(new BoltTokenizer())); // Bolt operator
// do further processing
[...]
{% endhighlight %}
</div>
</div>
### Named Attribute Access for Embedded Bolts
Bolts can accesses input tuple fields via name (additionally to access via index).
To use this feature with embedded Bolts, you need to have either a
1. [POJO]({{site.baseurl}}/dev/api_concepts.html#pojos) type input stream or
2. [Tuple]({{site.baseurl}}/dev/api_concepts.html#tuples-and-case-classes) type input stream and specify the input schema (i.e. name-to-index-mapping)
For POJO input types, Flink accesses the fields via reflection.
For this case, Flink expects either a corresponding public member variable or public getter method.
For example, if a Bolt accesses a field via name `sentence` (eg, `String s = input.getStringByField("sentence");`), the input POJO class must have a member variable `public String sentence;` or method `public String getSentence() { ... };` (pay attention to camel-case naming).
For `Tuple` input types, it is required to specify the input schema using Storm's `Fields` class.
For this case, the constructor of `BoltWrapper` takes an additional argument: `new BoltWrapper<Tuple1<String>, ...>(..., new Fields("sentence"))`.
The input type is `Tuple1<String>` and `Fields("sentence")` specify that `input.getStringByField("sentence")` is equivalent to `input.getString(0)`.
See [BoltTokenizerWordCountPojo](https://github.com/apache/flink/tree/master/flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/BoltTokenizerWordCountPojo.java) and [BoltTokenizerWordCountWithNames](https://github.com/apache/flink/tree/master/flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/wordcount/BoltTokenizerWordCountWithNames.java) for examples.
## Configuring Spouts and Bolts
In Storm, Spouts and Bolts can be configured with a globally distributed `Map` object that is given to `submitTopology(...)` method of `LocalCluster` or `StormSubmitter`.
This `Map` is provided by the user next to the topology and gets forwarded as a parameter to the calls `Spout.open(...)` and `Bolt.prepare(...)`.
To replicate this functionality Flink's configuration mechanism must be used.
A global configuration can be set in a `StreamExecutionEnvironment` via `.getConfig().setGlobalJobParameters(...)`.
Flink's regular `Configuration` class can be used to configure Spouts and Bolts.
However, `Configuration` does not support arbitrary key data types as Storm does (only `String` keys are allowed).
Thus, Flink additionally provides `StormConfig` class that can be used like a raw `Map` to provide full compatibility to Storm.
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
StormConfig config = new StormConfig();
// set config values
[...]
// set global Storm configuration
env.getConfig().setGlobalJobParameters(config);
// assemble program with embedded Spouts and/or Bolts
[...]
{% endhighlight %}
</div>
</div>
## Multiple Output Streams
Flink can also handle the declaration of multiple output streams for Spouts and Bolts.
The output stream will be of data type `SplitStreamType<T>` and must be split by using `DataStream.split(...)` and `SplitStream.select(...)`.
Flink provides the predefined output selector `StormStreamSelector<T>` for `.split(...)` already.
Furthermore, the wrapper type `SplitStreamTuple<T>` can be removed using `SplitStreamMapper<T>`.
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
[...]
// get DataStream from Spout or Bolt which declares two output streams s1 and s2 with output type SomeType
DataStream<SplitStreamType<SomeType>> multiStream = ...
SplitStream<SplitStreamType<SomeType>> splitStream = multiStream.split(new StormStreamSelector<SomeType>());
// remove SplitStreamType using SplitStreamMapper to get data stream of type SomeType
DataStream<SomeType> s1 = splitStream.select("s1").map(new SplitStreamMapper<SomeType>()).returns(SomeType.class);
DataStream<SomeType> s2 = splitStream.select("s2").map(new SplitStreamMapper<SomeType>()).returns(SomeType.class);
// do further processing on s1 and s2
[...]
{% endhighlight %}
</div>
</div>
See [SpoutSplitExample.java](https://github.com/apache/flink/tree/master/flink-contrib/flink-storm-examples/src/main/java/org/apache/flink/storm/split/SpoutSplitExample.java) for a full example.
# Flink Extensions
## Finite Spouts
In Flink, streaming sources can be finite, ie, emit a finite number of records and stop after emitting the last record. However, Spouts usually emit infinite streams.
The bridge between the two approaches is the `FiniteSpout` interface which, in addition to `IRichSpout`, contains a `reachedEnd()` method, where the user can specify a stopping-condition.
The user can create a finite Spout by implementing this interface instead of (or additionally to) `IRichSpout`, and implementing the `reachedEnd()` method in addition.
In contrast to a `SpoutWrapper` that is configured to emit a finite number of tuples, `FiniteSpout` interface allows to implement more complex termination criteria.
Although finite Spouts are not necessary to embed Spouts into a Flink streaming program or to submit a whole Storm topology to Flink, there are cases where they may come in handy:
* to achieve that a native Spout behaves the same way as a finite Flink source with minimal modifications
* the user wants to process a stream only for some time; after that, the Spout can stop automatically
* reading a file into a stream
* for testing purposes
An example of a finite Spout that emits records for 10 seconds only:
<div class="codetabs" markdown="1">
<div data-lang="java" markdown="1">
{% highlight java %}
public class TimedFiniteSpout extends BaseRichSpout implements FiniteSpout {
[...] // implement open(), nextTuple(), ...
private long starttime = System.currentTimeMillis();
public boolean reachedEnd() {
return System.currentTimeMillis() - starttime > 10000l;
}
}
{% endhighlight %}
</div>
</div>
# Storm Compatibility Examples
You can find more examples in Maven module `flink-storm-examples`.
For the different versions of WordCount, see [README.md](https://github.com/apache/flink/tree/master/flink-contrib/flink-storm-examples/README.md).
To run the examples, you need to assemble a correct jar file.
`flink-storm-examples-{{ site.version }}.jar` is **no** valid jar file for job execution (it is only a standard maven artifact).
There are example jars for embedded Spout and Bolt, namely `WordCount-SpoutSource.jar` and `WordCount-BoltTokenizer.jar`, respectively.
Compare `pom.xml` to see how both jars are built.
You can run each of those examples via `bin/flink run <jarname>.jar`. The correct entry point class is contained in each jar's manifest file.
{% top %}
---
title: "Storm Compatibility"
layout: redirect
redirect: /dev/libs/storm_compatibility.html
permalink: /apis/streaming/storm_compatibility.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
# flink-storm-examples
This module contains multiple versions of a simple Word-Count example to illustrate the usage of the compatibility layer:
* the usage of spouts and bolts within a regular Flink streaming program (ie, embedded mode)
1. `SpoutSourceWordCount` uses a spout as data source within a Flink streaming program
2. `BoltTokenizeerWordCount` uses a bolt to split sentences into words within a Flink streaming program
* `BoltTokenizeerWordCountWithNames` used `Tuple` input type and accesses attributes by field names (rather than index)
* `BoltTokenizeerWordCountPOJO` used POJO input type and accesses attributes by field names (rather than index)
* how to submit a whole Storm topology to Flink
3. `WordCountTopology` plugs a Storm topology together
* `StormWordCountLocal` submits the topology to a local Flink cluster (similar to a `LocalCluster` in Storm)
(`WordCountLocalByName` accesses attributes by field names rather than index)
* `WordCountRemoteByClient` submits the topology to a remote Flink cluster (similar to the usage of `NimbusClient` in Storm)
* `WordCountRemoteBySubmitter` submits the topology to a remote Flink cluster (similar to the usage of `StormSubmitter` in Storm)
Additionally, this module package the three example Word-Count programs as jar files to be submitted to a Flink cluster via `bin/flink run example.jar`.
(Valid jars are `WordCount-SpoutSource.jar`, `WordCount-BoltTokenizer.jar`, and `WordCount-StormTopology.jar`)
The package `org.apache.flink.storm.wordcount.operators` contains original spouts and bolts that can be used unmodified within Storm or Flink.
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.flink</groupId>
<artifactId>flink-contrib</artifactId>
<version>1.8-SNAPSHOT</version>
<relativePath>..</relativePath>
</parent>
<artifactId>flink-storm-examples_${scala.binary.version}</artifactId>
<name>flink-storm-examples</name>
<packaging>jar</packaging>
<repositories>
<!-- This repository is needed as a stable source for some Clojure libraries -->
<repository>
<id>clojars</id>
<url>https://clojars.org/repo/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- core dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-storm_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${project.version}</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-starter</artifactId>
<version>1.0.0</version>
<!-- remove storm dependency - it should be drawn only (with proper
customization) via the 'flink-storm' dependency -->
<exclusions>
<exclusion>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>curator-framework</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- test dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-shaded-guava</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-test-utils_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-deploy-plugin</artifactId>
<configuration>
<skip>true</skip>
</configuration>
</plugin>
<!-- get default data from flink-example-batch package -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.9</version><!--$NO-MVN-MAN-VER$-->
<executions>
<execution>
<id>unpack</id>
<phase>prepare-package</phase>
<goals>
<goal>unpack</goal>
</goals>
<configuration>
<artifactItems>
<artifactItem>
<groupId>org.apache.flink</groupId>
<artifactId>flink-storm_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>jar</type>
<overWrite>false</overWrite>
<outputDirectory>${project.build.directory}/classes</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.0.0</version>
<type>jar</type>
<overWrite>false</overWrite>
<outputDirectory>${project.build.directory}/classes</outputDirectory>
<!-- need to exclude to be able to run
* StormWordCountRemoteByClient and
* StormWordCountRemoteBySubmitter
within Eclipse -->
<excludes>defaults.yaml</excludes>
</artifactItem>
<artifactItem>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1</version>
<type>jar</type>
<overWrite>false</overWrite>
<outputDirectory>${project.build.directory}/classes</outputDirectory>
</artifactItem>
<artifactItem>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>1.11</version>
<type>jar</type>
<overWrite>false</overWrite>
<outputDirectory>${project.build.directory}/classes</outputDirectory>
</artifactItem>
</artifactItems>
</configuration>
</execution>
</executions>
</plugin>
<!-- self-contained jars for each example -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<executions>
<!-- WordCount Spout source-->
<!-- example for embedded spout - for whole topologies see "WordCount Storm topology" example below -->
<execution>
<id>WordCount-SpoutSource</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<finalName>WordCount</finalName>
<classifier>SpoutSource</classifier>
<archive>
<manifestEntries>
<program-class>org.apache.flink.storm.wordcount.SpoutSourceWordCount</program-class>
</manifestEntries>
</archive>
<includes>
<!-- from storm-core -->
<include>org/apache/storm/topology/*.class</include>
<include>org/apache/storm/spout/*.class</include>
<include>org/apache/storm/task/*.class</include>
<include>org/apache/storm/tuple/*.class</include>
<include>org/apache/storm/generated/*.class</include>
<include>org/apache/storm/metric/**/*.class</include>
<include>org/apache/storm/thrift/**/*.class</include>
<!-- Storm's recursive dependencies -->
<include>org/json/simple/**/*.class</include>
<include>org/apache/storm/shade/**/*.class</include>
<!-- compatibility layer -->
<include>org/apache/flink/storm/api/*.class</include>
<include>org/apache/flink/storm/util/*.class</include>
<include>org/apache/flink/storm/wrappers/*.class</include>
<!-- Word Count -->
<include>org/apache/flink/storm/wordcount/SpoutSourceWordCount.class</include>
<include>org/apache/flink/storm/wordcount/SpoutSourceWordCount$*.class</include>
<include>org/apache/flink/storm/wordcount/operators/WordCountFileSpout.class</include>
<include>org/apache/flink/storm/wordcount/operators/WordCountInMemorySpout.class
</include>
<include>org/apache/flink/storm/util/AbstractLineSpout.class</include>
<include>org/apache/flink/storm/util/FileSpout.class</include>
<include>org/apache/flink/storm/util/InMemorySpout.class</include>
<include>org/apache/flink/storm/wordcount/util/WordCountData.class</include>
</includes>
</configuration>
</execution>
<!-- WordCount Bolt tokenizer-->
<!-- example for embedded bolt - for whole topologies see "WordCount Storm topology" example below -->
<execution>
<id>WordCount-BoltTokenizer</id>
<phase>package</phase>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<finalName>WordCount</finalName>
<classifier>BoltTokenizer</classifier>
<archive>
<manifestEntries>
<program-class>org.apache.flink.storm.wordcount.BoltTokenizerWordCount
</program-class>
</manifestEntries>
</archive>
<includes>
<!-- from storm-core -->
<include>org/apache/storm/topology/*.class</include>
<include>org/apache/storm/spout/*.class</include>
<include>org/apache/storm/task/*.class</include>
<include>org/apache/storm/tuple/*.class</include>
<include>org/apache/storm/generated/*.class</include>
<include>org/apache/storm/metric/**/*.class</include>
<include>org/apache/storm/thrift/**/*.class</include>
<!-- Storm's recursive dependencies -->
<include>org/json/simple/**/*.class</include>
<include>org/apache/storm/shade/**/*.class</include>
<!-- compatibility layer -->
<include>org/apache/flink/storm/api/*.class</include>
<include>org/apache/flink/storm/util/*.class</include>
<include>org/apache/flink/storm/wrappers/*.class</include>
<!-- Word Count -->
<include>org/apache/flink/storm/wordcount/BoltTokenizerWordCount.class</include>
<include>org/apache/flink/storm/wordcount/operators/BoltTokenizer.class</include>
<include>org/apache/flink/storm/wordcount/util/WordCountData.class</include>
</includes>
</configuration>
</execution>
<execution>
<goals>
<goal>test-jar</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
<pluginManagement>
<plugins>
<!--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-->
<plugin>
<groupId>org.eclipse.m2e</groupId>
<artifactId>lifecycle-mapping</artifactId>
<version>1.0.0</version>
<configuration>
<lifecycleMappingMetadata>
<pluginExecutions>
<pluginExecution>
<pluginExecutionFilter>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<versionRange>[2.9,)</versionRange>
<goals>
<goal>unpack</goal>
</goals>
</pluginExecutionFilter>
<action>
<ignore/>
</action>
</pluginExecution>
</pluginExecutions>
</lifecycleMappingMetadata>
</configuration>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.exclamation;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.storm.exclamation.operators.ExclamationBolt;
import org.apache.flink.storm.util.StormConfig;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.flink.storm.wrappers.BoltWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.storm.utils.Utils;
/**
* Implements the "Exclamation" program that attaches 3+x exclamation marks to every line of a text files in a streaming
* fashion. The program is constructed as a regular {@link org.apache.storm.generated.StormTopology}.
*
* <p>The input is a plain text file with lines separated by newline characters.
*
* <p>Usage:
* <code>ExclamationWithBolt &lt;text path&gt; &lt;result path&gt; &lt;number of exclamation marks&gt;</code><br>
* If no parameters are provided, the program is run with default data from {@link WordCountData} with x=2.
*
* <p>This example shows how to:
* <ul>
* <li>use a Bolt within a Flink Streaming program</li>
* <li>how to configure a Bolt using StormConfig</li>
* </ul>
*/
public class ExclamationWithBolt {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
if (!parseParameters(args)) {
return;
}
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// set Storm configuration
StormConfig config = new StormConfig();
config.put(ExclamationBolt.EXCLAMATION_COUNT, new Integer(exclamationNum));
env.getConfig().setGlobalJobParameters(config);
// get input data
final DataStream<String> text = getTextDataStream(env);
final DataStream<String> exclaimed = text
.transform("StormBoltTokenizer",
TypeExtractor.getForObject(""),
new BoltWrapper<String, String>(new ExclamationBolt(),
new String[] { Utils.DEFAULT_STREAM_ID }))
.map(new ExclamationMap());
// emit result
if (fileOutput) {
exclaimed.writeAsText(outputPath);
} else {
exclaimed.print();
}
// execute program
env.execute("Streaming WordCount with bolt tokenizer");
}
// *************************************************************************
// USER FUNCTIONS
// *************************************************************************
private static class ExclamationMap implements MapFunction<String, String> {
private static final long serialVersionUID = 4614754344067170619L;
@Override
public String map(String value) throws Exception {
return value + "!!!";
}
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static boolean fileOutput = false;
private static String textPath;
private static String outputPath;
private static int exclamationNum = 2;
private static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
fileOutput = true;
if (args.length == 3) {
textPath = args[0];
outputPath = args[1];
exclamationNum = Integer.parseInt(args[2]);
} else {
System.err.println("Usage: ExclamationWithBolt <text path> <result path> <number of exclamation marks>");
return false;
}
} else {
System.out.println("Executing ExclamationWithBolt example with built-in default data");
System.out.println(" Provide parameters to read input data from a file");
System.out.println(" Usage: ExclamationWithBolt <text path> <result path> <number of exclamation marks>");
}
return true;
}
private static DataStream<String> getTextDataStream(final StreamExecutionEnvironment env) {
if (fileOutput) {
// read the text file from given input path
return env.readTextFile(textPath);
}
return env.fromElements(WordCountData.WORDS);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.exclamation;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.storm.util.FiniteFileSpout;
import org.apache.flink.storm.util.FiniteInMemorySpout;
import org.apache.flink.storm.util.StormConfig;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.flink.storm.wrappers.SpoutWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.storm.utils.Utils;
/**
* Implements the "Exclamation" program that attaches six exclamation marks to every line of a text files in a streaming
* fashion. The program is constructed as a regular {@link org.apache.storm.generated.StormTopology}.
*
* <p>The input is a plain text file with lines separated by newline characters.
*
* <p>Usage: <code>ExclamationWithSpout &lt;text path&gt; &lt;result path&gt;</code><br>
* If no parameters are provided, the program is run with default data from {@link WordCountData}.
*
* <p>This example shows how to:
* <ul>
* <li>use a Storm spout within a Flink Streaming program</li>
* <li>make use of the FiniteSpout interface</li>
* <li>make use of the FiniteSpout interface</li>
* <li>how to configure a Spout using StormConfig</li>
* </ul>
*/
public class ExclamationWithSpout {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
if (!parseParameters(args)) {
return;
}
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data
final DataStream<String> text = getTextDataStream(env);
final DataStream<String> exclaimed = text
.map(new ExclamationMap())
.map(new ExclamationMap());
// emit result
if (fileOutput) {
exclaimed.writeAsText(outputPath);
} else {
exclaimed.print();
}
// execute program
env.execute("Streaming Exclamation with Storm spout source");
}
// *************************************************************************
// USER FUNCTIONS
// *************************************************************************
private static class ExclamationMap implements MapFunction<String, String> {
private static final long serialVersionUID = -684993133807698042L;
@Override
public String map(String value) throws Exception {
return value + "!!!";
}
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static boolean fileOutput = false;
private static String textPath;
private static String outputPath;
private static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
fileOutput = true;
if (args.length == 2) {
textPath = args[0];
outputPath = args[1];
} else {
System.err.println("Usage: ExclamationWithSpout <text path> <result path>");
return false;
}
} else {
System.out.println("Executing ExclamationWithSpout example with built-in default data");
System.out.println(" Provide parameters to read input data from a file");
System.out.println(" Usage: ExclamationWithSpout <text path> <result path>");
}
return true;
}
private static DataStream<String> getTextDataStream(final StreamExecutionEnvironment env) {
if (fileOutput) {
final String[] tokens = textPath.split(":");
final String inputFile = tokens[tokens.length - 1];
// set Storm configuration
StormConfig config = new StormConfig();
config.put(FiniteFileSpout.INPUT_FILE_PATH, inputFile);
env.getConfig().setGlobalJobParameters(config);
return env.addSource(
new SpoutWrapper<String>(new FiniteFileSpout(),
new String[] { Utils.DEFAULT_STREAM_ID }),
TypeExtractor.getForClass(String.class)).setParallelism(1);
}
return env.addSource(
new SpoutWrapper<String>(new FiniteInMemorySpout(
WordCountData.WORDS), new String[] { Utils.DEFAULT_STREAM_ID }),
TypeExtractor.getForClass(String.class)).setParallelism(1);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.exclamation.operators;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
* A Bolt implementation that appends exclamation marks to incoming tuples. The number of added exclamation marks can
* be controlled by setting <code>exclamation.count</code>.
*/
public class ExclamationBolt implements IRichBolt {
private static final long serialVersionUID = -6364882114201311380L;
public static final String EXCLAMATION_COUNT = "exclamation.count";
private OutputCollector collector;
private String exclamation;
@SuppressWarnings("rawtypes")
@Override
public void prepare(Map conf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
Object count = conf.get(EXCLAMATION_COUNT);
if (count != null) {
int exclamationNum = (Integer) count;
StringBuilder builder = new StringBuilder();
for (int index = 0; index < exclamationNum; ++index) {
builder.append('!');
}
this.exclamation = builder.toString();
} else {
this.exclamation = "!";
}
}
@Override
public void cleanup() {
}
@Override
public void execute(Tuple tuple) {
collector.emit(tuple, new Values(tuple.getString(0) + this.exclamation));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("word"));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.split;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.storm.split.operators.RandomSpout;
import org.apache.flink.storm.split.operators.VerifyAndEnrichBolt;
import org.apache.flink.storm.util.SplitStreamMapper;
import org.apache.flink.storm.util.SplitStreamType;
import org.apache.flink.storm.util.StormStreamSelector;
import org.apache.flink.storm.wrappers.BoltWrapper;
import org.apache.flink.storm.wrappers.SpoutWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SplitStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
/**
* Implements a simple example with two declared output streams for the embedded spout.
*
* <p>This example shows how to:
* <ul>
* <li>handle multiple output stream of a spout</li>
* <li>accessing each stream by .split(...) and .select(...)</li>
* <li>strip wrapper data type SplitStreamType for further processing in Flink</li>
* </ul>
*
* <p>This example would work the same way for multiple bolt output streams.
*/
public class SpoutSplitExample {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
boolean useFile = SpoutSplitExample.parseParameters(args);
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
String[] rawOutputs = new String[] { RandomSpout.EVEN_STREAM, RandomSpout.ODD_STREAM };
final DataStream<SplitStreamType<Integer>> numbers = env.addSource(
new SpoutWrapper<SplitStreamType<Integer>>(new RandomSpout(true, seed), rawOutputs,
1000), TypeExtractor.getForObject(new SplitStreamType<Integer>()));
SplitStream<SplitStreamType<Integer>> splitStream = numbers
.split(new StormStreamSelector<Integer>());
DataStream<SplitStreamType<Integer>> evenStream = splitStream.select(RandomSpout.EVEN_STREAM);
DataStream<SplitStreamType<Integer>> oddStream = splitStream.select(RandomSpout.ODD_STREAM);
DataStream<Tuple2<String, Integer>> evenResult = evenStream
.map(new SplitStreamMapper<Integer>()).returns(Integer.class).map(new Enrich(true));
DataStream<Tuple2<String, Integer>> oddResult = oddStream.map(
new SplitStreamMapper<Integer>()).transform("oddBolt",
TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
new BoltWrapper<Integer, Tuple2<String, Integer>>(new VerifyAndEnrichBolt(false)));
if (useFile) {
evenResult.writeAsText(outputPath + "/even");
oddResult.writeAsText(outputPath + "/odd");
} else {
evenResult.print();
oddResult.print();
}
// execute program
env.execute("Spout split stream example");
}
// *************************************************************************
// USER FUNCTIONS
// *************************************************************************
/**
* Same as {@link VerifyAndEnrichBolt}.
*/
public static final class Enrich implements MapFunction<Integer, Tuple2<String, Integer>> {
private static final long serialVersionUID = 5213888269197438892L;
private final Tuple2<String, Integer> out;
private final boolean isEven;
public static boolean errorOccured = false;
public Enrich(boolean isEven) {
this.isEven = isEven;
if (isEven) {
this.out = new Tuple2<String, Integer>("even", 0);
} else {
this.out = new Tuple2<String, Integer>("odd", 0);
}
}
@Override
public Tuple2<String, Integer> map(Integer value) throws Exception {
if ((value.intValue() % 2 == 0) != this.isEven) {
errorOccured = true;
}
this.out.setField(value, 1);
return this.out;
}
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static long seed = System.currentTimeMillis();
private static String outputPath = null;
static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
if (args.length == 2) {
seed = Long.parseLong(args[0]);
outputPath = args[1];
return true;
} else {
throw new IllegalArgumentException(
"Usage: SplitStreamBoltLocal <seed> <result path>");
}
} else {
System.out.println("Executing SplitBoltTopology example with random data");
System.out.println(" Usage: SplitStreamBoltLocal <seed> <result path>");
}
return false;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.split.operators;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichSpout;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Values;
import java.util.Map;
import java.util.Random;
/**
* A Spout implementation that emits random numbers, optionally splitting them into odd/even streams.
*/
public class RandomSpout extends BaseRichSpout {
private static final long serialVersionUID = -3978554318742509334L;
public static final String EVEN_STREAM = "even";
public static final String ODD_STREAM = "odd";
private final boolean split;
private Random r = new Random();
private SpoutOutputCollector collector;
public RandomSpout(boolean split, long seed) {
this.split = split;
this.r = new Random(seed);
}
@SuppressWarnings("rawtypes")
@Override
public void open(Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.collector = collector;
}
@Override
public void nextTuple() {
int i = r.nextInt();
if (split) {
if (i % 2 == 0) {
this.collector.emit(EVEN_STREAM, new Values(i));
} else {
this.collector.emit(ODD_STREAM, new Values(i));
}
} else {
this.collector.emit(new Values(i));
}
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
Fields schema = new Fields("number");
if (split) {
declarer.declareStream(EVEN_STREAM, schema);
declarer.declareStream(ODD_STREAM, schema);
} else {
declarer.declare(schema);
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.split.operators;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.topology.base.BaseRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
* Verifies that incoming numbers are either even or odd, controlled by the constructor argument. Emitted tuples are
* enriched with a new string field containing either "even" or "odd", based on the number's parity.
*/
public class VerifyAndEnrichBolt extends BaseRichBolt {
private static final long serialVersionUID = -7277395570966328721L;
private final boolean evenOrOdd; // true: even -- false: odd
private final String token;
private OutputCollector collector;
public static boolean errorOccured = false;
public VerifyAndEnrichBolt(boolean evenOrOdd) {
this.evenOrOdd = evenOrOdd;
this.token = evenOrOdd ? "even" : "odd";
}
@SuppressWarnings("rawtypes")
@Override
public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(Tuple input) {
if ((input.getInteger(0) % 2 == 0) != this.evenOrOdd) {
errorOccured = true;
}
this.collector.emit(new Values(this.token, input.getInteger(0)));
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("evenOrOdd", "number"));
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Tuple;
import java.util.Map;
/**
* Implements a sink that write the received data so some external output. The result is formatted like
* {@code (a1, a2, ..., an)} with {@code Object.toString()} for each attribute).
*/
public abstract class AbstractBoltSink implements IRichBolt {
private static final long serialVersionUID = -1626323806848080430L;
private StringBuilder lineBuilder;
private String prefix = "";
private final OutputFormatter formatter;
public AbstractBoltSink(final OutputFormatter formatter) {
this.formatter = formatter;
}
@SuppressWarnings("rawtypes")
@Override
public final void prepare(final Map stormConf, final TopologyContext context,
final OutputCollector collector) {
this.prepareSimple(stormConf, context);
if (context.getComponentCommon(context.getThisComponentId()).get_parallelism_hint() > 1) {
this.prefix = context.getThisTaskId() + "> ";
}
}
protected abstract void prepareSimple(Map<?, ?> stormConf, TopologyContext context);
@Override
public final void execute(final Tuple input) {
this.lineBuilder = new StringBuilder();
this.lineBuilder.append(this.prefix);
this.lineBuilder.append(this.formatter.format(input));
this.writeExternal(this.lineBuilder.toString());
}
protected abstract void writeExternal(String line);
@Override
public void cleanup() {/* nothing to do */}
@Override
public final void declareOutputFields(final OutputFieldsDeclarer declarer) {/* nothing to do */}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import java.util.Map;
/**
* Base class for Spouts that read data line by line from an arbitrary source. The declared output schema has a single
* attribute called {@code line} and should be of type {@link String}.
*/
public abstract class AbstractLineSpout implements IRichSpout {
private static final long serialVersionUID = 8876828403487806771L;
public static final String ATTRIBUTE_LINE = "line";
protected SpoutOutputCollector collector;
@SuppressWarnings("rawtypes")
@Override
public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
this.collector = collector;
}
@Override
public void close() {/* noting to do */}
@Override
public void activate() {/* noting to do */}
@Override
public void deactivate() {/* noting to do */}
@Override
public void ack(final Object msgId) {/* noting to do */}
@Override
public void fail(final Object msgId) {/* noting to do */}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(ATTRIBUTE_LINE));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.task.TopologyContext;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Map;
/**
* Implements a sink that write the received data to the given file (as a result of {@code Object.toString()} for each
* attribute).
*/
public final class BoltFileSink extends AbstractBoltSink {
private static final long serialVersionUID = 2014027288631273666L;
private final String path;
private BufferedWriter writer;
public BoltFileSink(final String path) {
this(path, new SimpleOutputFormatter());
}
public BoltFileSink(final String path, final OutputFormatter formatter) {
super(formatter);
this.path = path;
}
@SuppressWarnings("rawtypes")
@Override
public void prepareSimple(final Map stormConf, final TopologyContext context) {
try {
this.writer = new BufferedWriter(new FileWriter(this.path));
} catch (final IOException e) {
throw new RuntimeException(e);
}
}
@Override
public void writeExternal(final String line) {
try {
this.writer.write(line + "\n");
} catch (final IOException e) {
throw new RuntimeException(e);
}
}
@Override
public void cleanup() {
if (this.writer != null) {
try {
this.writer.close();
} catch (final IOException e) {
throw new RuntimeException(e);
}
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.task.TopologyContext;
import java.util.Map;
/**
* Implements a sink that prints the received data to {@code stdout}.
*/
public final class BoltPrintSink extends AbstractBoltSink {
private static final long serialVersionUID = -6650011223001009519L;
public BoltPrintSink(OutputFormatter formatter) {
super(formatter);
}
@SuppressWarnings("rawtypes")
@Override
public void prepareSimple(final Map stormConf, final TopologyContext context) {
/* nothing to do */
}
@Override
public void writeExternal(final String line) {
System.out.println(line);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.tuple.Values;
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.Map;
/**
* Implements a Spout that reads data from a given local file.
*/
public class FileSpout extends AbstractLineSpout {
private static final long serialVersionUID = -6996907090003590436L;
public static final String INPUT_FILE_PATH = "input.path";
protected String path = null;
protected BufferedReader reader;
public FileSpout() {}
public FileSpout(final String path) {
this.path = path;
}
@SuppressWarnings("rawtypes")
@Override
public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
super.open(conf, context, collector);
Object configuredPath = conf.get(INPUT_FILE_PATH);
if (configuredPath != null) {
this.path = (String) configuredPath;
}
try {
this.reader = new BufferedReader(new FileReader(this.path));
} catch (final FileNotFoundException e) {
throw new RuntimeException(e);
}
}
@Override
public void close() {
if (this.reader != null) {
try {
this.reader.close();
} catch (final IOException e) {
throw new RuntimeException(e);
}
}
}
@Override
public void nextTuple() {
String line;
try {
line = this.reader.readLine();
if (line != null) {
this.collector.emit(new Values(line));
}
} catch (final IOException e) {
throw new RuntimeException(e);
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.tuple.Values;
import java.io.IOException;
import java.util.Map;
/**
* Implements a Spout that reads data from a given local file. The spout stops automatically
* when it reached the end of the file.
*/
public class FiniteFileSpout extends FileSpout implements FiniteSpout {
private static final long serialVersionUID = -1472978008607215864L;
private String line;
private boolean newLineRead;
public FiniteFileSpout() {}
public FiniteFileSpout(String path) {
super(path);
}
@SuppressWarnings("rawtypes")
@Override
public void open(final Map conf, final TopologyContext context, final SpoutOutputCollector collector) {
super.open(conf, context, collector);
newLineRead = false;
}
@Override
public void nextTuple() {
this.collector.emit(new Values(line));
newLineRead = false;
}
/**
* Can be called before nextTuple() any times including 0.
*/
@Override
public boolean reachedEnd() {
try {
readLine();
} catch (IOException e) {
throw new RuntimeException("Exception occured while reading file " + path);
}
return line == null;
}
private void readLine() throws IOException {
if (!newLineRead) {
line = reader.readLine();
newLineRead = true;
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
/**
* Implements a Spout that reads String[] data stored in memory. The Spout stops automatically when it emitted all of
* the data.
*/
public class FiniteInMemorySpout extends InMemorySpout<String> implements FiniteSpout {
private static final long serialVersionUID = -4008858647468647019L;
public FiniteInMemorySpout(String[] source) {
super(source);
}
@Override
public boolean reachedEnd() {
return counter >= source.length;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.tuple.Values;
/**
* Implements a Spout that reads data stored in memory.
*/
public class InMemorySpout<T> extends AbstractLineSpout {
private static final long serialVersionUID = -4008858647468647019L;
protected T[] source;
protected int counter = 0;
public InMemorySpout(T[] source) {
this.source = source;
}
@Override
public void nextTuple() {
if (this.counter < source.length) {
this.collector.emit(new Values(source[this.counter++]));
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.tuple.Tuple;
import java.io.Serializable;
/**
* Interface that is used to convert Storm {@link Tuple Tuples} to a string before writing them out to a file or to the
* console.
*/
public interface OutputFormatter extends Serializable {
/**
* Converts a Storm {@link Tuple} to a string. This method is used for formatting the output tuples before writing
* them out to a file or to the console.
*
* @param input
* The tuple to be formatted
* @return The string result of the formatting
*/
String format(Tuple input);
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.tuple.Tuple;
/**
* Simple {@link OutputFormatter} implementation to convert {@link Tuple Tuples} with a size of 1 by returning the
* result of {@link Object#toString()} for the first field.
*/
public class SimpleOutputFormatter implements OutputFormatter {
private static final long serialVersionUID = 6349573860144270338L;
/**
* Converts a Storm {@link Tuple} with 1 field to a string by retrieving the value of that field. This method is
* used for formatting raw outputs wrapped in tuples, before writing them out to a file or to the console.
*
* @param input
* The tuple to be formatted
* @return The string result of the formatting
*/
@Override
public String format(final Tuple input) {
if (input.getValues().size() != 1) {
throw new RuntimeException("The output is not raw");
}
return input.getValue(0).toString();
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.tuple.Tuple;
/**
* {@link OutputFormatter} implementation that converts {@link Tuple Tuples} of arbitrary size to a string. For a given
* tuple the output is <code>(field1,field2,...,fieldX)</code>.
*/
public class TupleOutputFormatter implements OutputFormatter {
private static final long serialVersionUID = -599665757723851761L;
@Override
public String format(final Tuple input) {
final StringBuilder stringBuilder = new StringBuilder();
stringBuilder.append("(");
for (final Object attribute : input.getValues()) {
stringBuilder.append(attribute);
stringBuilder.append(",");
}
stringBuilder.replace(stringBuilder.length() - 1, stringBuilder.length(), ")");
return stringBuilder.toString();
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.storm.wordcount.operators.BoltTokenizer;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.flink.storm.wrappers.BoltWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.storm.topology.IRichBolt;
/**
* Implements the "WordCount" program that computes a simple word occurrence histogram over text files in a streaming
* fashion. The tokenizer step is performed by a {@link IRichBolt Bolt}.
*
* <p>The input is a plain text file with lines separated by newline characters.
*
* <p>Usage: <code>WordCount &lt;text path&gt; &lt;result path&gt;</code><br>
* If no parameters are provided, the program is run with default data from {@link WordCountData}.
*
* <p>This example shows how to:
* <ul>
* <li>use a Bolt within a Flink Streaming program.</li>
* </ul>
*/
public class BoltTokenizerWordCount {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
if (!parseParameters(args)) {
return;
}
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data
final DataStream<String> text = getTextDataStream(env);
final DataStream<Tuple2<String, Integer>> counts = text
// split up the lines in pairs (2-tuples) containing: (word,1)
// this is done by a bolt that is wrapped accordingly
.transform("BoltTokenizer",
TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
new BoltWrapper<String, Tuple2<String, Integer>>(new BoltTokenizer()))
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
// emit result
if (fileOutput) {
counts.writeAsText(outputPath);
} else {
counts.print();
}
// execute program
env.execute("Streaming WordCount with bolt tokenizer");
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static boolean fileOutput = false;
private static String textPath;
private static String outputPath;
private static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
fileOutput = true;
if (args.length == 2) {
textPath = args[0];
outputPath = args[1];
} else {
System.err.println("Usage: BoltTokenizerWordCount <text path> <result path>");
return false;
}
} else {
System.out.println("Executing BoltTokenizerWordCount example with built-in default data");
System.out.println(" Provide parameters to read input data from a file");
System.out.println(" Usage: BoltTokenizerWordCount <text path> <result path>");
}
return true;
}
private static DataStream<String> getTextDataStream(final StreamExecutionEnvironment env) {
if (fileOutput) {
// read the text file from given input path
return env.readTextFile(textPath);
}
return env.fromElements(WordCountData.WORDS);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.api.java.io.CsvInputFormat;
import org.apache.flink.api.java.io.PojoCsvInputFormat;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.PojoTypeInfo;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.core.fs.Path;
import org.apache.flink.storm.wordcount.operators.BoltTokenizerByName;
import org.apache.flink.storm.wordcount.operators.WordCountDataPojos;
import org.apache.flink.storm.wordcount.operators.WordCountDataPojos.Sentence;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.flink.storm.wrappers.BoltWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.storm.topology.IRichBolt;
/**
* Implements the "WordCount" program that computes a simple word occurrence histogram over text files in a streaming
* fashion. The tokenizer step is performed by a {@link IRichBolt Bolt}. In contrast to {@link BoltTokenizerWordCount}
* the tokenizer's input is a POJO type and the single field is accessed by name.
*
* <p>The input is a plain text file with lines separated by newline characters.
*
* <p>Usage: <code>WordCount &lt;text path&gt; &lt;result path&gt;</code><br>
* If no parameters are provided, the program is run with default data from {@link WordCountData}.
*
* <p>This example shows how to:
* <ul>
* <li>how to access attributes by name within a Bolt for POJO type input streams
* </ul>
*/
public class BoltTokenizerWordCountPojo {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
if (!parseParameters(args)) {
return;
}
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data
final DataStream<Sentence> text = getTextDataStream(env);
final DataStream<Tuple2<String, Integer>> counts = text
// split up the lines in pairs (2-tuples) containing: (word,1)
// this is done by a bolt that is wrapped accordingly
.transform("BoltTokenizerPojo",
TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
new BoltWrapper<Sentence, Tuple2<String, Integer>>(new BoltTokenizerByName()))
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
// emit result
if (fileOutput) {
counts.writeAsText(outputPath);
} else {
counts.print();
}
// execute program
env.execute("Streaming WordCount with POJO bolt tokenizer");
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static boolean fileOutput = false;
private static String textPath;
private static String outputPath;
private static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
fileOutput = true;
if (args.length == 2) {
textPath = args[0];
outputPath = args[1];
} else {
System.err.println("Usage: BoltTokenizerWordCountPojo <text path> <result path>");
return false;
}
} else {
System.out
.println("Executing BoltTokenizerWordCountPojo example with built-in default data");
System.out.println(" Provide parameters to read input data from a file");
System.out.println(" Usage: BoltTokenizerWordCountPojo <text path> <result path>");
}
return true;
}
private static DataStream<Sentence> getTextDataStream(final StreamExecutionEnvironment env) {
if (fileOutput) {
// read the text file from given input path
PojoTypeInfo<Sentence> sourceType = (PojoTypeInfo<Sentence>) TypeExtractor
.getForObject(new Sentence(""));
return env.createInput(new PojoCsvInputFormat<Sentence>(new Path(
textPath), CsvInputFormat.DEFAULT_LINE_DELIMITER,
CsvInputFormat.DEFAULT_LINE_DELIMITER, sourceType),
sourceType);
}
return env.fromElements(WordCountDataPojos.SENTENCES);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.api.java.io.CsvInputFormat;
import org.apache.flink.api.java.io.TupleCsvInputFormat;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TupleTypeInfo;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.core.fs.Path;
import org.apache.flink.storm.wordcount.operators.BoltTokenizerByName;
import org.apache.flink.storm.wordcount.operators.WordCountDataTuple;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.flink.storm.wrappers.BoltWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.tuple.Fields;
/**
* Implements the "WordCount" program that computes a simple word occurrence histogram over text files in a streaming
* fashion. The tokenizer step is performed by a {@link IRichBolt Bolt}. In contrast to {@link BoltTokenizerWordCount}
* the tokenizer's input is a {@link Tuple} type and the single field is accessed by name.
*
* <p>The input is a plain text file with lines separated by newline characters.
*
* <p>Usage: <code>WordCount &lt;text path&gt; &lt;result path&gt;</code><br>
* If no parameters are provided, the program is run with default data from {@link WordCountData}.
*
* <p>This example shows how to:
* <ul>
* <li>how to access attributes by name within a Bolt for {@link Tuple} type input streams
* </ul>
*/
public class BoltTokenizerWordCountWithNames {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
if (!parseParameters(args)) {
return;
}
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data
final DataStream<Tuple1<String>> text = getTextDataStream(env);
final DataStream<Tuple2<String, Integer>> counts = text
// split up the lines in pairs (2-tuples) containing: (word,1)
// this is done by a Storm bolt that is wrapped accordingly
.transform(
"BoltTokenizerWithNames",
TypeExtractor.getForObject(new Tuple2<String, Integer>("", 0)),
new BoltWrapper<Tuple1<String>, Tuple2<String, Integer>>(
new BoltTokenizerByName(), new Fields("sentence")))
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
// emit result
if (fileOutput) {
counts.writeAsText(outputPath);
} else {
counts.print();
}
// execute program
env.execute("Streaming WordCount with schema bolt tokenizer");
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static boolean fileOutput = false;
private static String textPath;
private static String outputPath;
private static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
fileOutput = true;
if (args.length == 2) {
textPath = args[0];
outputPath = args[1];
} else {
System.err.println("Usage: BoltTokenizerWordCountWithNames <text path> <result path>");
return false;
}
} else {
System.out.println("Executing BoltTokenizerWordCountWithNames example with built-in default data");
System.out.println(" Provide parameters to read input data from a file");
System.out.println(" Usage: BoltTokenizerWordCountWithNames <text path> <result path>");
}
return true;
}
private static DataStream<Tuple1<String>> getTextDataStream(final StreamExecutionEnvironment env) {
if (fileOutput) {
// read the text file from given input path
TupleTypeInfo<Tuple1<String>> sourceType = (TupleTypeInfo<Tuple1<String>>) TypeExtractor
.getForObject(new Tuple1<String>(""));
return env.createInput(new TupleCsvInputFormat<Tuple1<String>>(new Path(
textPath), CsvInputFormat.DEFAULT_LINE_DELIMITER,
CsvInputFormat.DEFAULT_LINE_DELIMITER, sourceType),
sourceType);
}
return env.fromElements(WordCountDataTuple.TUPLES);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.api.common.functions.FlatMapFunction;
import org.apache.flink.api.java.tuple.Tuple2;
import org.apache.flink.api.java.typeutils.TypeExtractor;
import org.apache.flink.storm.wordcount.operators.WordCountFileSpout;
import org.apache.flink.storm.wordcount.operators.WordCountInMemorySpout;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.flink.storm.wrappers.SpoutWrapper;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.util.Collector;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.utils.Utils;
/**
* Implements the "WordCount" program that computes a simple word occurrence histogram over text files in a streaming
* fashion. The used data source is a {@link IRichSpout Spout}.
*
* <p>The input is a plain text file with lines separated by newline characters.
*
* <p>Usage: <code>WordCount &lt;text path&gt; &lt;result path&gt;</code><br>
* If no parameters are provided, the program is run with default data from {@link WordCountData}.
*
* <p>This example shows how to:
* <ul>
* <li>use a Spout within a Flink Streaming program.</li>
* </ul>
*/
public class SpoutSourceWordCount {
// *************************************************************************
// PROGRAM
// *************************************************************************
public static void main(final String[] args) throws Exception {
if (!parseParameters(args)) {
return;
}
// set up the execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
// get input data
final DataStream<String> text = getTextDataStream(env);
final DataStream<Tuple2<String, Integer>> counts =
// split up the lines in pairs (2-tuples) containing: (word,1)
text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1"
.keyBy(0).sum(1);
// emit result
if (fileOutput) {
counts.writeAsText(outputPath);
} else {
counts.print();
}
// execute program
env.execute("Streaming WordCount with spout source");
}
// *************************************************************************
// USER FUNCTIONS
// *************************************************************************
/**
* Implements the string tokenizer that splits sentences into words as a user-defined FlatMapFunction. The function
* takes a line (String) and splits it into multiple pairs in the form of "(word,1)" ({@code Tuple2<String, Integer>}).
*/
public static final class Tokenizer implements FlatMapFunction<String, Tuple2<String, Integer>> {
private static final long serialVersionUID = 1L;
@Override
public void flatMap(final String value, final Collector<Tuple2<String, Integer>> out) throws Exception {
// normalize and split the line
final String[] tokens = value.toLowerCase().split("\\W+");
// emit the pairs
for (final String token : tokens) {
if (token.length() > 0) {
out.collect(new Tuple2<String, Integer>(token, 1));
}
}
}
}
// *************************************************************************
// UTIL METHODS
// *************************************************************************
private static boolean fileOutput = false;
private static String textPath;
private static String outputPath;
private static boolean parseParameters(final String[] args) {
if (args.length > 0) {
// parse input arguments
fileOutput = true;
if (args.length == 2) {
textPath = args[0];
outputPath = args[1];
} else {
System.err.println("Usage: SpoutSourceWordCount <text path> <result path>");
return false;
}
} else {
System.out.println("Executing SpoutSourceWordCount example with built-in default data");
System.out.println(" Provide parameters to read input data from a file");
System.out.println(" Usage: SpoutSourceWordCount <text path> <result path>");
}
return true;
}
private static DataStream<String> getTextDataStream(final StreamExecutionEnvironment env) {
if (fileOutput) {
// read the text file from given input path
final String[] tokens = textPath.split(":");
final String localFile = tokens[tokens.length - 1];
return env.addSource(
new SpoutWrapper<String>(new WordCountFileSpout(localFile),
new String[] { Utils.DEFAULT_STREAM_ID }, -1),
TypeExtractor.getForClass(String.class)).setParallelism(1);
}
return env.addSource(
new SpoutWrapper<String>(new WordCountInMemorySpout(),
new String[] { Utils.DEFAULT_STREAM_ID }, -1),
TypeExtractor.getForClass(String.class)).setParallelism(1);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.HashMap;
import java.util.Map;
/**
* Implements the word counter that counts the occurrence of each unique word. The bolt takes a pair (input tuple
* schema: {@code <String,Integer>}) and sums the given word count for each unique word (output tuple schema:
* {@code <String,Integer>} ).
*
* <p>Same as {@link BoltCounterByName}, but accesses input attribute by index (instead of name).
*/
public class BoltCounter implements IRichBolt {
private static final long serialVersionUID = 399619605462625934L;
public static final String ATTRIBUTE_WORD = "word";
public static final String ATTRIBUTE_COUNT = "count";
private final HashMap<String, Count> counts = new HashMap<String, Count>();
private OutputCollector collector;
@SuppressWarnings("rawtypes")
@Override
public void prepare(final Map stormConf, final TopologyContext context, final OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(final Tuple input) {
final String word = input.getString(BoltTokenizer.ATTRIBUTE_WORD_INDEX);
Count currentCount = this.counts.get(word);
if (currentCount == null) {
currentCount = new Count();
this.counts.put(word, currentCount);
}
currentCount.count += input.getInteger(BoltTokenizer.ATTRIBUTE_COUNT_INDEX);
this.collector.emit(new Values(word, currentCount.count));
}
@Override
public void cleanup() {/* nothing to do */}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(ATTRIBUTE_WORD, ATTRIBUTE_COUNT));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
/**
* A counter helper to emit immutable tuples to the given stormCollector and avoid unnecessary object
* creating/deletion.
*/
private static final class Count {
public int count;
public Count() {/* nothing to do */}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.HashMap;
import java.util.Map;
/**
* Implements the word counter that counts the occurrence of each unique word. The bolt takes a pair (input tuple
* schema: {@code <String,Integer>}) and sums the given word count for each unique word (output tuple schema:
* {@code <String,Integer>} ).
*
* <p>Same as {@link BoltCounter}, but accesses input attribute by name (instead of index).
*/
public class BoltCounterByName implements IRichBolt {
private static final long serialVersionUID = 399619605462625934L;
public static final String ATTRIBUTE_WORD = "word";
public static final String ATTRIBUTE_COUNT = "count";
private final HashMap<String, Count> counts = new HashMap<String, Count>();
private OutputCollector collector;
@SuppressWarnings("rawtypes")
@Override
public void prepare(final Map stormConf, final TopologyContext context, final OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(final Tuple input) {
final String word = input.getStringByField(BoltTokenizer.ATTRIBUTE_WORD);
Count currentCount = this.counts.get(word);
if (currentCount == null) {
currentCount = new Count();
this.counts.put(word, currentCount);
}
currentCount.count += input.getIntegerByField(BoltTokenizer.ATTRIBUTE_COUNT);
this.collector.emit(new Values(word, currentCount.count));
}
@Override
public void cleanup() {/* nothing to do */}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(ATTRIBUTE_WORD, ATTRIBUTE_COUNT));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
/**
* A counter helper to emit immutable tuples to the given stormCollector and avoid unnecessary object
* creating/deletion.
*/
private static final class Count {
public int count;
public Count() {/* nothing to do */}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
* Implements the string tokenizer that splits sentences into words as a bolt. The bolt takes a line (input tuple
* schema: {@code <String>}) and splits it into multiple pairs in the form of "(word,1)" (output tuple schema:
* {@code <String,Integer>}).
*
* <p>Same as {@link BoltTokenizerByName}, but accesses input attribute by index (instead of name).
*/
public final class BoltTokenizer implements IRichBolt {
private static final long serialVersionUID = -8589620297208175149L;
public static final String ATTRIBUTE_WORD = "word";
public static final String ATTRIBUTE_COUNT = "count";
public static final int ATTRIBUTE_WORD_INDEX = 0;
public static final int ATTRIBUTE_COUNT_INDEX = 1;
private OutputCollector collector;
@SuppressWarnings("rawtypes")
@Override
public void prepare(final Map stormConf, final TopologyContext context, final OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(final Tuple input) {
final String[] tokens = input.getString(0).toLowerCase().split("\\W+");
for (final String token : tokens) {
if (token.length() > 0) {
this.collector.emit(new Values(token, 1));
}
}
}
@Override
public void cleanup() {/* nothing to do */}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(ATTRIBUTE_WORD, ATTRIBUTE_COUNT));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.Tuple;
import org.apache.storm.tuple.Values;
import java.util.Map;
/**
* Implements the string tokenizer that splits sentences into words as a bolt. The bolt takes a line (input tuple
* schema: {@code <String>}) and splits it into multiple pairs in the form of "(word,1)" (output tuple schema:
* {@code <String,Integer>}).
*
* <p>Same as {@link BoltTokenizer}, but accesses input attribute by name (instead of index).
*/
public final class BoltTokenizerByName implements IRichBolt {
private static final long serialVersionUID = -8589620297208175149L;
public static final String ATTRIBUTE_WORD = "word";
public static final String ATTRIBUTE_COUNT = "count";
public static final int ATTRIBUTE_WORD_INDEX = 0;
public static final int ATTRIBUTE_COUNT_INDEX = 1;
private OutputCollector collector;
@SuppressWarnings("rawtypes")
@Override
public void prepare(final Map stormConf, final TopologyContext context, final OutputCollector collector) {
this.collector = collector;
}
@Override
public void execute(final Tuple input) {
final String[] tokens = input.getStringByField("sentence").toLowerCase().split("\\W+");
for (final String token : tokens) {
if (token.length() > 0) {
this.collector.emit(new Values(token, 1));
}
}
}
@Override
public void cleanup() {/* nothing to do */}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields(ATTRIBUTE_WORD, ATTRIBUTE_COUNT));
}
@Override
public Map<String, Object> getComponentConfiguration() {
return null;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.flink.storm.wordcount.util.WordCountData;
import java.io.Serializable;
/**
* Input POJOs for WordCount programs.
*/
public class WordCountDataPojos {
public static final Sentence[] SENTENCES;
static {
SENTENCES = new Sentence[WordCountData.WORDS.length];
for (int i = 0; i < SENTENCES.length; ++i) {
SENTENCES[i] = new Sentence(WordCountData.WORDS[i]);
}
}
/**
* Simple POJO containing a string.
*/
public static class Sentence implements Serializable {
private static final long serialVersionUID = -7336372859203407522L;
private String sentence;
public Sentence() {
}
public Sentence(String sentence) {
this.sentence = sentence;
}
public String getSentence() {
return sentence;
}
public void setSentence(String sentence) {
this.sentence = sentence;
}
@Override
public String toString() {
return "(" + this.sentence + ")";
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.storm.wordcount.util.WordCountData;
/**
* Input tuples for WordCount programs.
*/
@SuppressWarnings("unchecked")
public class WordCountDataTuple {
public static final Tuple1<String>[] TUPLES;
static {
TUPLES = new Tuple1[WordCountData.WORDS.length];
for (int i = 0; i < TUPLES.length; ++i) {
TUPLES[i] = new Tuple1<String>(WordCountData.WORDS[i]);
}
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.flink.storm.util.FileSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
/**
* Implements a Spout that reads data from a given local file.
*/
public final class WordCountFileSpout extends FileSpout {
private static final long serialVersionUID = 2372251989250954503L;
public WordCountFileSpout(String path) {
super(path);
}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.operators;
import org.apache.flink.storm.util.FiniteInMemorySpout;
import org.apache.flink.storm.wordcount.util.WordCountData;
import org.apache.storm.topology.OutputFieldsDeclarer;
import org.apache.storm.tuple.Fields;
/**
* Implements a Spout that reads data from {@link WordCountData#WORDS}.
*/
public final class WordCountInMemorySpout extends FiniteInMemorySpout {
private static final long serialVersionUID = 8832143302409465843L;
public WordCountInMemorySpout() {
super(WordCountData.WORDS);
}
@Override
public void declareOutputFields(final OutputFieldsDeclarer declarer) {
declarer.declare(new Fields("sentence"));
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount.util;
/**
* Provides the default data sets used for the WordCount example program.
* The default data sets are used, if no parameters are given to the program.
*/
public class WordCountData {
public static final String[] WORDS = new String[]{
"To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune",
"Or to take arms against a sea of troubles,",
"And by opposing end them?--To die,--to sleep,--",
"No more; and by a sleep to say we end",
"The heartache, and the thousand natural shocks",
"That flesh is heir to,--'tis a consummation",
"Devoutly to be wish'd. To die,--to sleep;--",
"To sleep! perchance to dream:--ay, there's the rub;",
"For in that sleep of death what dreams may come,",
"When we have shuffled off this mortal coil,",
"Must give us pause: there's the respect",
"That makes calamity of so long life;",
"For who would bear the whips and scorns of time,",
"The oppressor's wrong, the proud man's contumely,",
"The pangs of despis'd love, the law's delay,",
"The insolence of office, and the spurns",
"That patient merit of the unworthy takes,",
"When he himself might his quietus make",
"With a bare bodkin? who would these fardels bear,",
"To grunt and sweat under a weary life,",
"But that the dread of something after death,--",
"The undiscover'd country, from whose bourn",
"No traveller returns,--puzzles the will,",
"And makes us rather bear those ills we have",
"Than fly to others that we know not of?",
"Thus conscience does make cowards of us all;",
"And thus the native hue of resolution",
"Is sicklied o'er with the pale cast of thought;",
"And enterprises of great pith and moment,",
"With this regard, their currents turn awry,",
"And lose the name of action.--Soft you now!",
"The fair Ophelia!--Nymph, in thy orisons",
"Be all my sins remember'd."
};
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.exclamation;
import org.apache.flink.storm.exclamation.util.ExclamationData;
import org.apache.flink.test.testdata.WordCountData;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.Test;
/**
* Test for the ExclamationWithBolt example.
*/
public class ExclamationWithBoltITCase extends AbstractTestBase {
@Test
public void testProgram() throws Exception {
String textPath = createTempFile("text.txt", WordCountData.TEXT);
String resultPath = getTempDirPath("result");
String exclamationNum = "3";
ExclamationWithBolt.main(new String[]{textPath, resultPath, exclamationNum});
compareResultsByLinesInMemory(ExclamationData.TEXT_WITH_EXCLAMATIONS, resultPath);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.exclamation;
import org.apache.flink.storm.exclamation.util.ExclamationData;
import org.apache.flink.test.testdata.WordCountData;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.Test;
/**
* Test for the ExclamationWithSpout example.
*/
public class ExclamationWithSpoutITCase extends AbstractTestBase {
@Test
public void testProgram() throws Exception {
String textPath = createTempFile("text.txt", WordCountData.TEXT);
String resultPath = getTempDirPath("result");
ExclamationWithSpout.main(new String[]{textPath, resultPath});
compareResultsByLinesInMemory(ExclamationData.TEXT_WITH_EXCLAMATIONS, resultPath);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.exclamation.util;
/**
* Expected output of Exclamation programs.
*/
public class ExclamationData {
public static final String TEXT_WITH_EXCLAMATIONS =
"Goethe - Faust: Der Tragoedie erster Teil!!!!!!\n"
+ "Prolog im Himmel.!!!!!!\n"
+ "Der Herr. Die himmlischen Heerscharen. Nachher Mephistopheles. Die drei!!!!!!\n"
+ "Erzengel treten vor.!!!!!!\n"
+ "RAPHAEL: Die Sonne toent, nach alter Weise, In Brudersphaeren Wettgesang,!!!!!!\n"
+ "Und ihre vorgeschriebne Reise Vollendet sie mit Donnergang. Ihr Anblick!!!!!!\n"
+ "gibt den Engeln Staerke, Wenn keiner Sie ergruenden mag; die unbegreiflich!!!!!!\n"
+ "hohen Werke Sind herrlich wie am ersten Tag.!!!!!!\n"
+ "GABRIEL: Und schnell und unbegreiflich schnelle Dreht sich umher der Erde!!!!!!\n"
+ "Pracht; Es wechselt Paradieseshelle Mit tiefer, schauervoller Nacht. Es!!!!!!\n"
+ "schaeumt das Meer in breiten Fluessen Am tiefen Grund der Felsen auf, Und!!!!!!\n"
+ "Fels und Meer wird fortgerissen Im ewig schnellem Sphaerenlauf.!!!!!!\n"
+ "MICHAEL: Und Stuerme brausen um die Wette Vom Meer aufs Land, vom Land!!!!!!\n"
+ "aufs Meer, und bilden wuetend eine Kette Der tiefsten Wirkung rings umher.!!!!!!\n"
+ "Da flammt ein blitzendes Verheeren Dem Pfade vor des Donnerschlags. Doch!!!!!!\n"
+ "deine Boten, Herr, verehren Das sanfte Wandeln deines Tags.!!!!!!\n"
+ "ZU DREI: Der Anblick gibt den Engeln Staerke, Da keiner dich ergruenden!!!!!!\n"
+ "mag, Und alle deine hohen Werke Sind herrlich wie am ersten Tag.!!!!!!\n"
+ "MEPHISTOPHELES: Da du, o Herr, dich einmal wieder nahst Und fragst, wie!!!!!!\n"
+ "alles sich bei uns befinde, Und du mich sonst gewoehnlich gerne sahst, So!!!!!!\n"
+ "siehst du mich auch unter dem Gesinde. Verzeih, ich kann nicht hohe Worte!!!!!!\n"
+ "machen, Und wenn mich auch der ganze Kreis verhoehnt; Mein Pathos braechte!!!!!!\n"
+ "dich gewiss zum Lachen, Haettst du dir nicht das Lachen abgewoehnt. Von!!!!!!\n"
+ "Sonn' und Welten weiss ich nichts zu sagen, Ich sehe nur, wie sich die!!!!!!\n"
+ "Menschen plagen. Der kleine Gott der Welt bleibt stets von gleichem!!!!!!\n"
+ "Schlag, Und ist so wunderlich als wie am ersten Tag. Ein wenig besser!!!!!!\n"
+ "wuerd er leben, Haettst du ihm nicht den Schein des Himmelslichts gegeben;!!!!!!\n"
+ "Er nennt's Vernunft und braucht's allein, Nur tierischer als jedes Tier!!!!!!\n"
+ "zu sein. Er scheint mir, mit Verlaub von euer Gnaden, Wie eine der!!!!!!\n"
+ "langbeinigen Zikaden, Die immer fliegt und fliegend springt Und gleich im!!!!!!\n"
+ "Gras ihr altes Liedchen singt; Und laeg er nur noch immer in dem Grase! In!!!!!!\n"
+ "jeden Quark begraebt er seine Nase.!!!!!!\n"
+ "DER HERR: Hast du mir weiter nichts zu sagen? Kommst du nur immer!!!!!!\n"
+ "anzuklagen? Ist auf der Erde ewig dir nichts recht?!!!!!!\n"
+ "MEPHISTOPHELES: Nein Herr! ich find es dort, wie immer, herzlich!!!!!!\n"
+ "schlecht. Die Menschen dauern mich in ihren Jammertagen, Ich mag sogar!!!!!!\n"
+ "die armen selbst nicht plagen.!!!!!!\n" + "DER HERR: Kennst du den Faust?!!!!!!\n"
+ "MEPHISTOPHELES: Den Doktor?!!!!!!\n"
+ "DER HERR: Meinen Knecht!!!!!!!\n"
+ "MEPHISTOPHELES: Fuerwahr! er dient Euch auf besondre Weise. Nicht irdisch!!!!!!\n"
+ "ist des Toren Trank noch Speise. Ihn treibt die Gaerung in die Ferne, Er!!!!!!\n"
+ "ist sich seiner Tollheit halb bewusst; Vom Himmel fordert er die schoensten!!!!!!\n"
+ "Sterne Und von der Erde jede hoechste Lust, Und alle Naeh und alle Ferne!!!!!!\n"
+ "Befriedigt nicht die tiefbewegte Brust.!!!!!!\n"
+ "DER HERR: Wenn er mir auch nur verworren dient, So werd ich ihn bald in!!!!!!\n"
+ "die Klarheit fuehren. Weiss doch der Gaertner, wenn das Baeumchen gruent, Das!!!!!!\n"
+ "Bluet und Frucht die kuenft'gen Jahre zieren.!!!!!!\n"
+ "MEPHISTOPHELES: Was wettet Ihr? den sollt Ihr noch verlieren! Wenn Ihr!!!!!!\n"
+ "mir die Erlaubnis gebt, Ihn meine Strasse sacht zu fuehren.!!!!!!\n"
+ "DER HERR: Solang er auf der Erde lebt, So lange sei dir's nicht verboten,!!!!!!\n"
+ "Es irrt der Mensch so lang er strebt.!!!!!!\n"
+ "MEPHISTOPHELES: Da dank ich Euch; denn mit den Toten Hab ich mich niemals!!!!!!\n"
+ "gern befangen. Am meisten lieb ich mir die vollen, frischen Wangen. Fuer!!!!!!\n"
+ "einem Leichnam bin ich nicht zu Haus; Mir geht es wie der Katze mit der Maus.!!!!!!\n"
+ "DER HERR: Nun gut, es sei dir ueberlassen! Zieh diesen Geist von seinem!!!!!!\n"
+ "Urquell ab, Und fuehr ihn, kannst du ihn erfassen, Auf deinem Wege mit!!!!!!\n"
+ "herab, Und steh beschaemt, wenn du bekennen musst: Ein guter Mensch, in!!!!!!\n"
+ "seinem dunklen Drange, Ist sich des rechten Weges wohl bewusst.!!!!!!\n"
+ "MEPHISTOPHELES: Schon gut! nur dauert es nicht lange. Mir ist fuer meine!!!!!!\n"
+ "Wette gar nicht bange. Wenn ich zu meinem Zweck gelange, Erlaubt Ihr mir!!!!!!\n"
+ "Triumph aus voller Brust. Staub soll er fressen, und mit Lust, Wie meine!!!!!!\n"
+ "Muhme, die beruehmte Schlange.!!!!!!\n"
+ "DER HERR: Du darfst auch da nur frei erscheinen; Ich habe deinesgleichen!!!!!!\n"
+ "nie gehasst. Von allen Geistern, die verneinen, ist mir der Schalk am!!!!!!\n"
+ "wenigsten zur Last. Des Menschen Taetigkeit kann allzu leicht erschlaffen,!!!!!!\n"
+ "er liebt sich bald die unbedingte Ruh; Drum geb ich gern ihm den Gesellen!!!!!!\n"
+ "zu, Der reizt und wirkt und muss als Teufel schaffen. Doch ihr, die echten!!!!!!\n"
+ "Goettersoehne, Erfreut euch der lebendig reichen Schoene! Das Werdende, das!!!!!!\n"
+ "ewig wirkt und lebt, Umfass euch mit der Liebe holden Schranken, Und was!!!!!!\n"
+ "in schwankender Erscheinung schwebt, Befestigt mit dauernden Gedanken!!!!!!!\n"
+ "(Der Himmel schliesst, die Erzengel verteilen sich.)!!!!!!\n"
+ "MEPHISTOPHELES (allein): Von Zeit zu Zeit seh ich den Alten gern, Und!!!!!!\n"
+ "huete mich, mit ihm zu brechen. Es ist gar huebsch von einem grossen Herrn,!!!!!!\n"
+ "So menschlich mit dem Teufel selbst zu sprechen.!!!!!!";
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.split;
import org.apache.flink.storm.split.SpoutSplitExample.Enrich;
import org.apache.flink.storm.split.operators.VerifyAndEnrichBolt;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.After;
import org.junit.Assert;
import org.junit.Before;
import org.junit.Test;
import java.io.File;
import java.io.IOException;
/**
* Tests for split examples.
*/
public class SplitITCase extends AbstractTestBase {
private String output;
@Before
public void prepare() throws IOException {
output = getTempFilePath("dummy").split(":")[1];
}
@After
public void cleanUp() throws IOException {
deleteRecursively(new File(output));
}
@Test
public void testEmbeddedSpout() throws Exception {
SpoutSplitExample.main(new String[] { "0", output });
Assert.assertFalse(VerifyAndEnrichBolt.errorOccured);
Assert.assertFalse(Enrich.errorOccured);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.test.testdata.WordCountData;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.Test;
/**
* Test for the BoltTokenizerWordCount example.
*/
public class BoltTokenizerWordCountITCase extends AbstractTestBase {
@Test
public void testProgram() throws Exception {
String textPath = createTempFile("text.txt", WordCountData.TEXT);
String resultPath = getTempDirPath("result");
BoltTokenizerWordCount.main(new String[]{textPath, resultPath});
compareResultsByLinesInMemory(WordCountData.STREAMING_COUNTS_AS_TUPLES, resultPath);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.test.testdata.WordCountData;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.Test;
/**
* Test for the BoltTokenizerWordCountPojo example.
*/
public class BoltTokenizerWordCountPojoITCase extends AbstractTestBase {
@Test
public void testProgram() throws Exception {
String textPath = createTempFile("text.txt", WordCountData.TEXT);
String resultPath = getTempDirPath("result");
BoltTokenizerWordCountPojo.main(new String[]{textPath, resultPath});
compareResultsByLinesInMemory(WordCountData.STREAMING_COUNTS_AS_TUPLES, resultPath);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.test.testdata.WordCountData;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.Test;
/**
* Test for the BoltTokenizerWordCountWithNames example.
*/
public class BoltTokenizerWordCountWithNamesITCase extends AbstractTestBase {
@Test
public void testProgram() throws Exception {
String textPath = createTempFile("text.txt", WordCountData.TEXT);
String resultPath = getTempDirPath("result");
BoltTokenizerWordCountWithNames.main(new String[]{textPath, resultPath});
compareResultsByLinesInMemory(WordCountData.STREAMING_COUNTS_AS_TUPLES, resultPath);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wordcount;
import org.apache.flink.test.testdata.WordCountData;
import org.apache.flink.test.util.AbstractTestBase;
import org.junit.Test;
/**
* Test for the SpoutSourceWordCount example.
*/
public class SpoutSourceWordCountITCase extends AbstractTestBase {
@Test
public void testProgram() throws Exception {
String textPath = createTempFile("text.txt", WordCountData.TEXT);
String resultPath = getTempDirPath("result");
SpoutSourceWordCount.main(new String[]{textPath, resultPath});
compareResultsByLinesInMemory(WordCountData.STREAMING_COUNTS_AS_TUPLES, resultPath);
}
}
################################################################################
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
################################################################################
# Set root logger level to DEBUG and its only appender to A1.
log4j.rootLogger=OFF, A1
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
# flink-storm
`flink-storm` is compatibility layer for Apache Storm and allows to embed Spouts or Bolts unmodified within a regular Flink streaming program (`SpoutWrapper` and `BoltWrapper`).
Additionally, a whole Storm topology can be submitted to Flink (see `FlinkLocalCluster`, and `FlinkSubmitter`).
Only a few minor changes to the original submitting code are required.
The code that builds the topology itself, can be reused unmodified. See `flink-storm-examples` for a simple word-count example.
**Please note**: Do not add `storm-core` as a dependency. It is already included via `flink-storm`.
The following Storm features are not (yet/fully) supported by the compatibility layer right now:
* no fault-tolerance guarantees (ie, calls to `ack()`/`fail()` and anchoring is ignored)
* for whole Storm topologies the following is not supported by Flink:
* direct emit connection pattern
* activating/deactivating and rebalancing of topologies
* task hooks
* metrics
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.apache.flink</groupId>
<artifactId>flink-contrib</artifactId>
<version>1.8-SNAPSHOT</version>
<relativePath>..</relativePath>
</parent>
<artifactId>flink-storm_${scala.binary.version}</artifactId>
<name>flink-storm</name>
<packaging>jar</packaging>
<repositories>
<!-- This repository is needed as a stable source for some Clojure libraries -->
<repository>
<id>clojars</id>
<url>https://clojars.org/repo/</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
</repositories>
<dependencies>
<!-- core dependencies -->
<!-- Together with the dependency management section in flink-parent, this
pins the Kryo version of transitive dependencies to the Flink Kryo version -->
<dependency>
<groupId>com.esotericsoftware.kryo</groupId>
<artifactId>kryo</artifactId>
<scope>provided</scope>
</dependency>
<!-- Core streaming API -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>provided</scope>
</dependency>
<!-- we only need the Apache Storm API, not all the runtime and web UI functionality,
so we exclude many of the unnecessary and possibly conflicting dependencies -->
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.0.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-slf4j-impl</artifactId>
</exclusion>
<exclusion>
<artifactId>slf4j-log4j12</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</exclusion>
<exclusion>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
</exclusion>
<exclusion>
<groupId>org.mockito</groupId>
<artifactId>mockito-all</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.curator</groupId>
<artifactId>curator-test</artifactId>
</exclusion>
<exclusion>
<groupId>com.esotericsoftware</groupId>
<artifactId>kryo</artifactId>
</exclusion>
<exclusion>
<groupId>ring</groupId>
<artifactId>ring-core</artifactId>
</exclusion>
<exclusion>
<groupId>ring</groupId>
<artifactId>ring-devel</artifactId>
</exclusion>
<exclusion>
<groupId>ring</groupId>
<artifactId>ring-servlet</artifactId>
</exclusion>
<exclusion>
<groupId>ring</groupId>
<artifactId>ring-jetty-adapter</artifactId>
</exclusion>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
</exclusion>
<exclusion>
<groupId>org.jgrapht</groupId>
<artifactId>jgrapht-core</artifactId>
</exclusion>
<exclusion>
<groupId>compojure</groupId>
<artifactId>compojure</artifactId>
</exclusion>
<exclusion>
<groupId>com.twitter</groupId>
<artifactId>chill-java</artifactId>
</exclusion>
<exclusion>
<groupId>commons-fileupload</groupId>
<artifactId>commons-fileupload</artifactId>
</exclusion>
<exclusion>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
</exclusion>
<exclusion>
<groupId>clout</groupId>
<artifactId>clout</artifactId>
</exclusion>
<exclusion>
<groupId>hiccup</groupId>
<artifactId>hiccup</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.googlecode.json-simple</groupId>
<artifactId>json-simple</artifactId>
<version>1.1</version>
</dependency>
<!-- test dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${project.version}</version>
<scope>test</scope>
<type>test-jar</type>
</dependency>
</dependencies>
</project>
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.topology.IRichSpout;
/**
* This interface represents a spout that emits a finite number of records. Common spouts emit infinite streams by
* default. To change this behavior and take advantage of Flink's finite-source capabilities, the spout should implement
* this interface.
*/
public interface FiniteSpout extends IRichSpout {
/**
* When returns true, the spout has reached the end of the stream.
*
* @return true, if the spout's stream reached its end, false otherwise
*/
boolean reachedEnd();
}
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichSpout;
import org.apache.storm.topology.OutputFieldsDeclarer;
import java.util.Map;
/**
* {@link NullTerminatingSpout} in a finite spout (ie, implements {@link FiniteSpout} interface) that wraps an
* infinite spout, and returns {@code true} in {@link #reachedEnd()} when the wrapped spout does not emit a tuple
* in {@code nextTuple()} for the first time.
*/
public class NullTerminatingSpout implements FiniteSpout {
private static final long serialVersionUID = -6976210409932076066L;
/** The original infinite Spout. */
private final IRichSpout spout;
/** The observer that checks if the given spouts emit a tuple or not on nextTuple(). */
private SpoutOutputCollectorObserver observer;
public NullTerminatingSpout(IRichSpout spout) {
this.spout = spout;
}
@Override
public void open(@SuppressWarnings("rawtypes") Map conf, TopologyContext context, SpoutOutputCollector collector) {
this.observer = new SpoutOutputCollectorObserver(collector);
this.observer.emitted = true;
this.spout.open(conf, context, this.observer);
}
@Override
public void close() {
this.spout.close();
}
@Override
public void activate() {
this.spout.activate();
}
@Override
public void deactivate() {
this.spout.deactivate();
}
@Override
public void nextTuple() {
this.observer.emitted = false;
this.spout.nextTuple();
}
@Override
public void ack(Object msgId) {
this.spout.ack(msgId);
}
@Override
public void fail(Object msgId) {
this.spout.fail(msgId);
}
@Override
public void declareOutputFields(OutputFieldsDeclarer declarer) {
this.spout.declareOutputFields(declarer);
}
@Override
public Map<String, Object> getComponentConfiguration() {
return this.spout.getComponentConfiguration();
}
@Override
public boolean reachedEnd() {
return !this.observer.emitted;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.flink.api.common.functions.MapFunction;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.datastream.SplitStream;
/**
* Strips {@link SplitStreamType}{@code <T>} away, ie, extracts the wrapped record of type {@code T}. Can be used to get
* a "clean" stream from a Spout/Bolt that declared multiple output streams (after the streams got separated using
* {@link DataStream#split(org.apache.flink.streaming.api.collector.selector.OutputSelector) .split(...)} and
* {@link SplitStream#select(String...) .select(...)}).
*
* @param <T>
*/
public class SplitStreamMapper<T> implements MapFunction<SplitStreamType<T>, T> {
private static final long serialVersionUID = 3550359150160908564L;
@Override
public T map(SplitStreamType<T> value) throws Exception {
return value.value;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.flink.streaming.api.datastream.DataStream;
/**
* Used by org.apache.flink.storm.wrappers.AbstractStormCollector to wrap
* output tuples if multiple output streams are declared. For this case, the Flink output data stream must be split via
* {@link DataStream#split(org.apache.flink.streaming.api.collector.selector.OutputSelector) .split(...)} using
* {@link StormStreamSelector}.
*
*/
public class SplitStreamType<T> {
/** The stream ID this tuple belongs to. */
public String streamId;
/** The actual data value. */
public T value;
@Override
public String toString() {
return "<sid:" + this.streamId + ",v:" + this.value + ">";
}
@Override
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
SplitStreamType<?> other = (SplitStreamType<?>) o;
return this.streamId.equals(other.streamId) && this.value.equals(other.value);
}
}
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.storm.spout.SpoutOutputCollector;
import org.apache.storm.utils.Utils;
import java.util.List;
/**
* Observes if a call to any {@code emit(...)} or {@code emitDirect(...)} method is made.
* The internal flag {@link #emitted} must be reset by the user manually.
*/
class SpoutOutputCollectorObserver extends SpoutOutputCollector {
/** The collector to be observed. */
private final SpoutOutputCollector delegate;
/** The internal flag that it set to {@code true} if a tuple gets emitted. */
boolean emitted;
public SpoutOutputCollectorObserver(SpoutOutputCollector delegate) {
super(null);
this.delegate = delegate;
}
@Override
public List<Integer> emit(String streamId, List<Object> tuple, Object messageId) {
emitted = true;
return this.delegate.emit(streamId, tuple, messageId);
}
@Override
public List<Integer> emit(List<Object> tuple, Object messageId) {
return emit(Utils.DEFAULT_STREAM_ID, tuple, messageId);
}
@Override
public List<Integer> emit(List<Object> tuple) {
return emit(tuple, null);
}
@Override
public List<Integer> emit(String streamId, List<Object> tuple) {
return emit(streamId, tuple, null);
}
@Override
public void emitDirect(int taskId, String streamId, List<Object> tuple, Object messageId) {
emitted = true;
delegate.emitDirect(taskId, streamId, tuple, messageId);
}
@Override
public void emitDirect(int taskId, List<Object> tuple, Object messageId) {
emitDirect(taskId, Utils.DEFAULT_STREAM_ID, tuple, messageId);
}
@Override
public void emitDirect(int taskId, String streamId, List<Object> tuple) {
emitDirect(taskId, streamId, tuple, null);
}
@Override
public void emitDirect(int taskId, List<Object> tuple) {
emitDirect(taskId, tuple, null);
}
@Override
public void reportError(Throwable error) {
delegate.reportError(error);
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.flink.api.common.ExecutionConfig.GlobalJobParameters;
import org.apache.storm.Config;
import java.util.Collection;
import java.util.HashMap;
import java.util.Map;
import java.util.Set;
/**
* {@link StormConfig} is used to provide a user-defined Storm configuration (ie, a raw {@link Map} or {@link Config}
* object) for embedded Spouts and Bolts.
*/
@SuppressWarnings("rawtypes")
public final class StormConfig extends GlobalJobParameters implements Map {
private static final long serialVersionUID = 8019519109673698490L;
/** Contains the actual configuration that is provided to Spouts and Bolts. */
private final Map config = new HashMap();
/**
* Creates an empty configuration.
*/
public StormConfig() {
}
/**
* Creates an configuration with initial values provided by the given {@code Map}.
*
* @param config
* Initial values for this configuration.
*/
@SuppressWarnings("unchecked")
public StormConfig(Map config) {
this.config.putAll(config);
}
@Override
public int size() {
return this.config.size();
}
@Override
public boolean isEmpty() {
return this.config.isEmpty();
}
@Override
public boolean containsKey(Object key) {
return this.config.containsKey(key);
}
@Override
public boolean containsValue(Object value) {
return this.config.containsValue(value);
}
@Override
public Object get(Object key) {
return this.config.get(key);
}
@SuppressWarnings("unchecked")
@Override
public Object put(Object key, Object value) {
return this.config.put(key, value);
}
@Override
public Object remove(Object key) {
return this.config.remove(key);
}
@SuppressWarnings("unchecked")
@Override
public void putAll(Map m) {
this.config.putAll(m);
}
@Override
public void clear() {
this.config.clear();
}
@SuppressWarnings("unchecked")
@Override
public Set<Object> keySet() {
return this.config.keySet();
}
@SuppressWarnings("unchecked")
@Override
public Collection<Object> values() {
return this.config.values();
}
@SuppressWarnings("unchecked")
@Override
public Set<java.util.Map.Entry<Object, Object>> entrySet() {
return this.config.entrySet();
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.util;
import org.apache.flink.streaming.api.collector.selector.OutputSelector;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
/**
* Used to split multiple declared output streams within Flink.
*/
public final class StormStreamSelector<T> implements OutputSelector<SplitStreamType<T>> {
private static final long serialVersionUID = 2553423379715401023L;
/** internal cache to avoid short living ArrayList objects. */
private final HashMap<String, List<String>> streams = new HashMap<String, List<String>>();
@Override
public Iterable<String> select(SplitStreamType<T> value) {
String sid = value.streamId;
List<String> streamId = this.streams.get(sid);
if (streamId == null) {
streamId = new ArrayList<String>(1);
streamId.add(sid);
this.streams.put(sid, streamId);
}
return streamId;
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wrappers;
import org.apache.flink.api.java.tuple.Tuple;
import org.apache.flink.api.java.tuple.Tuple0;
import org.apache.flink.api.java.tuple.Tuple25;
import org.apache.flink.storm.util.SplitStreamType;
import java.util.HashMap;
import java.util.List;
import java.util.Map.Entry;
/**
* A {@link AbstractStormCollector} transforms Storm tuples to Flink tuples.
*/
abstract class AbstractStormCollector<OUT> {
/** Flink output tuple of concrete type {@link Tuple0} to {@link Tuple25} per output stream. */
protected final HashMap<String, Tuple> outputTuple = new HashMap<String, Tuple>();
/** Flink split tuple. Used, if multiple output streams are declared. */
private final SplitStreamType<Object> splitTuple = new SplitStreamType<Object>();
/**
* The number of attributes of the output tuples per stream. (Determines the concrete type of {@link #outputTuple}).
* If {@link #numberOfAttributes} is zero, {@link #outputTuple} is not used and "raw" data type is used.
*/
protected final HashMap<String, Integer> numberOfAttributes;
/** Indicates of multiple output stream are declared and thus {@link SplitStreamType} must be used as output. */
private final boolean split;
/** The ID of the producer task. */
private final int taskId;
/** Is set to {@code true} each time a tuple is emitted. */
boolean tupleEmitted = false;
/**
* Instantiates a new {@link AbstractStormCollector} that emits Flink tuples via {@link #doEmit(Object)}. If the
* number of attributes is negative, any output type is supported (ie, raw type). If the number of attributes is
* between 0 and 25, the output type is {@link Tuple0} to {@link Tuple25}, respectively.
*
* @param numberOfAttributes
* The number of attributes of the emitted tuples per output stream.
* @param taskId
* The ID of the producer task (negative value for unknown).
* @throws UnsupportedOperationException
* if the specified number of attributes is greater than 25 or taskId support is enabled for a raw
* stream
*/
AbstractStormCollector(final HashMap<String, Integer> numberOfAttributes, final int taskId)
throws UnsupportedOperationException {
assert (numberOfAttributes != null);
this.numberOfAttributes = numberOfAttributes;
this.split = this.numberOfAttributes.size() > 1;
this.taskId = taskId;
for (Entry<String, Integer> outputStream : numberOfAttributes.entrySet()) {
int numAtt = outputStream.getValue();
if (this.taskId >= 0) {
if (numAtt < 0) {
throw new UnsupportedOperationException(
"Task ID transmission not supported for raw streams: "
+ outputStream.getKey());
}
++numAtt;
}
if (numAtt > 25) {
if (this.taskId >= 0) {
throw new UnsupportedOperationException(
"Flink cannot handle more then 25 attributes, but 25 (24 plus 1 for produer task ID) "
+ " are declared for stream '" + outputStream.getKey() + "' by the given bolt.");
} else {
throw new UnsupportedOperationException(
"Flink cannot handle more then 25 attributes, but " + numAtt
+ " are declared for stream '" + outputStream.getKey() + "' by the given bolt.");
}
} else if (numAtt >= 0) {
try {
this.outputTuple.put(outputStream.getKey(),
org.apache.flink.api.java.tuple.Tuple.getTupleClass(numAtt)
.newInstance());
} catch (final InstantiationException e) {
throw new RuntimeException(e);
} catch (final IllegalAccessException e) {
throw new RuntimeException(e);
}
}
}
}
/**
* Transforms a Storm tuple into a Flink tuple of type {@code OUT} and emits this tuple via {@link #doEmit(Object)}
* to the specified output stream.
*
* @param The
* The output stream id.
* @param tuple
* The Storm tuple to be emitted.
* @return the return value of {@link #doEmit(Object)}
*/
@SuppressWarnings("unchecked")
protected final List<Integer> tansformAndEmit(final String streamId, final List<Object> tuple) {
List<Integer> taskIds;
int numAtt = this.numberOfAttributes.get(streamId);
int taskIdIdx = numAtt;
if (this.taskId >= 0 && numAtt < 0) {
numAtt = 1;
taskIdIdx = 0;
}
if (numAtt >= 0) {
assert (tuple.size() == numAtt);
Tuple out = this.outputTuple.get(streamId);
for (int i = 0; i < numAtt; ++i) {
out.setField(tuple.get(i), i);
}
if (this.taskId >= 0) {
out.setField(this.taskId, taskIdIdx);
}
if (this.split) {
this.splitTuple.streamId = streamId;
this.splitTuple.value = out;
taskIds = doEmit((OUT) this.splitTuple);
} else {
taskIds = doEmit((OUT) out);
}
} else {
assert (tuple.size() == 1);
if (this.split) {
this.splitTuple.streamId = streamId;
this.splitTuple.value = tuple.get(0);
taskIds = doEmit((OUT) this.splitTuple);
} else {
taskIds = doEmit((OUT) tuple.get(0));
}
}
this.tupleEmitted = true;
return taskIds;
}
/**
* Emits a Flink tuple.
*
* @param flinkTuple
* The tuple to be emitted.
* @return the IDs of the tasks this tuple was sent to
*/
protected abstract List<Integer> doEmit(OUT flinkTuple);
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wrappers;
import org.apache.flink.api.java.tuple.Tuple0;
import org.apache.flink.api.java.tuple.Tuple25;
import org.apache.flink.streaming.api.operators.Output;
import org.apache.flink.util.Collector;
import org.apache.storm.task.IOutputCollector;
import org.apache.storm.tuple.Tuple;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
/**
* A {@link BoltCollector} is used by {@link BoltWrapper} to provided an Storm compatible
* output collector to the wrapped bolt. It transforms the emitted Storm tuples into Flink tuples
* and emits them via the provide {@link Output} object.
*/
class BoltCollector<OUT> extends AbstractStormCollector<OUT> implements IOutputCollector {
/** The Flink output Collector. */
private final Collector<OUT> flinkOutput;
/**
* Instantiates a new {@link BoltCollector} that emits Flink tuples to the given Flink output object. If the
* number of attributes is negative, any output type is supported (ie, raw type). If the number of attributes is
* between 0 and 25, the output type is {@link Tuple0} to {@link Tuple25}, respectively.
*
* @param numberOfAttributes
* The number of attributes of the emitted tuples per output stream.
* @param taskId
* The ID of the producer task (negative value for unknown).
* @param flinkOutput
* The Flink output object to be used.
* @throws UnsupportedOperationException
* if the specified number of attributes is greater than 25
*/
BoltCollector(final HashMap<String, Integer> numberOfAttributes, final int taskId,
final Collector<OUT> flinkOutput) throws UnsupportedOperationException {
super(numberOfAttributes, taskId);
assert (flinkOutput != null);
this.flinkOutput = flinkOutput;
}
@Override
protected List<Integer> doEmit(final OUT flinkTuple) {
this.flinkOutput.collect(flinkTuple);
// TODO
return null;
}
@Override
public void reportError(final Throwable error) {
// not sure, if Flink can support this
}
@Override
public List<Integer> emit(final String streamId, final Collection<Tuple> anchors, final List<Object> tuple) {
return this.tansformAndEmit(streamId, tuple);
}
@Override
public void emitDirect(final int taskId, final String streamId, final Collection<Tuple> anchors, final List<Object> tuple) {
throw new UnsupportedOperationException("Direct emit is not supported by Flink");
}
@Override
public void ack(final Tuple input) {}
@Override
public void fail(final Tuple input) {}
@Override
public void resetTimeout(Tuple var1) {}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wrappers;
import org.apache.flink.api.common.ExecutionConfig.GlobalJobParameters;
import org.apache.flink.api.java.tuple.Tuple0;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.api.java.tuple.Tuple25;
import org.apache.flink.storm.util.StormConfig;
import org.apache.flink.streaming.api.operators.AbstractStreamOperator;
import org.apache.flink.streaming.api.operators.OneInputStreamOperator;
import org.apache.flink.streaming.api.operators.TimestampedCollector;
import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
import org.apache.storm.task.OutputCollector;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.topology.IRichBolt;
import org.apache.storm.tuple.Fields;
import org.apache.storm.tuple.MessageId;
import org.apache.storm.utils.Utils;
import java.util.Collection;
import java.util.HashMap;
import static java.util.Arrays.asList;
/**
* A {@link BoltWrapper} wraps an {@link IRichBolt} in order to execute the Storm bolt within a Flink Streaming program.
* It takes the Flink input tuples of type {@code IN} and transforms them into {@link StormTuple}s that the bolt can
* process. Furthermore, it takes the bolt's output tuples and transforms them into Flink tuples of type {@code OUT}
* (see {@link AbstractStormCollector} for supported types).<br/>
* <br/>
* <strong>Works for single input streams only! See {@link MergedInputsBoltWrapper} for multi-input stream
* Bolts.</strong>
*/
public class BoltWrapper<IN, OUT> extends AbstractStreamOperator<OUT> implements OneInputStreamOperator<IN, OUT> {
private static final long serialVersionUID = -4788589118464155835L;
/** The default input component ID. */
public static final String DEFAULT_ID = "default ID";
/** The default bolt ID. */
public static final String DEFUALT_BOLT_NAME = "Unnamed Bolt";
/** The wrapped Storm {@link IRichBolt bolt}. */
protected final IRichBolt bolt;
/** The name of the bolt. */
private final String name;
/** Number of attributes of the bolt's output tuples per stream. */
private final HashMap<String, Integer> numberOfAttributes;
/** The topology context of the bolt. */
private transient TopologyContext topologyContext;
/** The schema (ie, ordered field names) of the input streams per producer taskID. */
private final HashMap<Integer, Fields> inputSchemas = new HashMap<Integer, Fields>();
/**
* We have to use this because Operators must output {@link StreamRecord}.
*/
protected transient TimestampedCollector<OUT> flinkCollector;
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. As no input schema is defined, attribute-by-name access in only possible for
* POJO input types. The output type will be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's
* declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @throws IllegalArgumentException
* If the number of declared output attributes is not with range [0;25].
*/
public BoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
this(bolt, null, (Collection<String>) null);
}
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. The given input schema enable attribute-by-name access for input types
* {@link Tuple0} to {@link Tuple25}. The output type will be one of {@link Tuple0} to {@link Tuple25} depending on
* the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param inputSchema
* The schema (ie, ordered field names) of the input stream. @throws IllegalArgumentException
* @throws IllegalArgumentException
* If the number of declared output attributes is not with range [0;25].
*/
public BoltWrapper(final IRichBolt bolt, final Fields inputSchema)
throws IllegalArgumentException {
this(bolt, inputSchema, (Collection<String>) null);
}
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. As no input schema is defined, attribute-by-name access in only possible for
* POJO input types. The output type can be any type if parameter {@code rawOutput} is {@code true} and the bolt's
* number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will be one of
* {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not within range
* [1;25].
*/
public BoltWrapper(final IRichBolt bolt, final String[] rawOutputs)
throws IllegalArgumentException {
this(bolt, null, asList(rawOutputs));
}
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. As no input schema is defined, attribute-by-name access in only possible for
* POJO input types. The output type can be any type if parameter {@code rawOutput} is {@code true} and the bolt's
* number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will be one of
* {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* [1;25].
*/
public BoltWrapper(final IRichBolt bolt, final Collection<String> rawOutputs) throws IllegalArgumentException {
this(bolt, null, rawOutputs);
}
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. The given input schema enable attribute-by-name access for input types
* {@link Tuple0} to {@link Tuple25}. The output type can be any type if parameter {@code rawOutput} is {@code true}
* and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will
* be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param inputSchema
* The schema (ie, ordered field names) of the input stream.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* [0;25].
*/
public BoltWrapper(
final IRichBolt bolt,
final Fields inputSchema,
final String[] rawOutputs)
throws IllegalArgumentException {
this(bolt, inputSchema, asList(rawOutputs));
}
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. The given input schema enable attribute-by-name access for input types
* {@link Tuple0} to {@link Tuple25}. The output type can be any type if parameter {@code rawOutput} is {@code true}
* and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will
* be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param inputSchema
* The schema (ie, ordered field names) of the input stream. @throws IllegalArgumentException If
* {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* [0;25].
*/
public BoltWrapper(final IRichBolt bolt, final Fields inputSchema,
final Collection<String> rawOutputs) throws IllegalArgumentException {
this(bolt, DEFUALT_BOLT_NAME, Utils.DEFAULT_STREAM_ID, DEFAULT_ID, inputSchema, rawOutputs);
}
/**
* Instantiates a new {@link BoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it can be used
* within a Flink streaming program. The given input schema enable attribute-by-name access for input types
* {@link Tuple0} to {@link Tuple25}. The output type can be any type if parameter {@code rawOutput} is {@code true}
* and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the output type will
* be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param name
* The name of the bolt.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* [0;25].
*/
public BoltWrapper(final IRichBolt bolt, final String name, final String inputStreamId,
final String inputComponentId, final Fields inputSchema,
final Collection<String> rawOutputs) throws IllegalArgumentException {
this.bolt = bolt;
this.name = name;
this.inputSchemas.put(null, inputSchema);
this.numberOfAttributes = WrapperSetupHelper.getNumberOfAttributes(bolt, rawOutputs);
}
@Override
public void open() throws Exception {
super.open();
this.flinkCollector = new TimestampedCollector<>(this.output);
GlobalJobParameters config = getExecutionConfig().getGlobalJobParameters();
StormConfig stormConfig = new StormConfig();
if (config != null) {
if (config instanceof StormConfig) {
stormConfig = (StormConfig) config;
} else {
stormConfig.putAll(config.toMap());
}
}
this.topologyContext = WrapperSetupHelper.createTopologyContext(
getRuntimeContext(), this.bolt, this.name, stormConfig);
final OutputCollector stormCollector = new OutputCollector(new BoltCollector<OUT>(
this.numberOfAttributes, this.topologyContext.getThisTaskId(), this.flinkCollector));
this.bolt.prepare(stormConfig, this.topologyContext, stormCollector);
}
@Override
public void dispose() throws Exception {
super.dispose();
this.bolt.cleanup();
}
@Override
public void processElement(final StreamRecord<IN> element) throws Exception {
this.flinkCollector.setTimestamp(element);
IN value = element.getValue();
this.bolt.execute(new StormTuple<>(value, this.inputSchemas.get(null), -1, null, null,
MessageId.makeUnanchored()));
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wrappers;
import clojure.lang.Atom;
import org.apache.storm.generated.StormTopology;
import org.apache.storm.hooks.ITaskHook;
import org.apache.storm.metric.api.CombinedMetric;
import org.apache.storm.metric.api.ICombiner;
import org.apache.storm.metric.api.IMetric;
import org.apache.storm.metric.api.IReducer;
import org.apache.storm.metric.api.ReducedMetric;
import org.apache.storm.state.ISubscribedState;
import org.apache.storm.task.TopologyContext;
import org.apache.storm.tuple.Fields;
import java.util.Collection;
import java.util.List;
import java.util.Map;
/**
* {@link FlinkTopologyContext} is a {@link TopologyContext} that overwrites certain method that are not applicable when
* a Storm topology is executed within Flink.
*/
final class FlinkTopologyContext extends TopologyContext {
/**
* Instantiates a new {@link FlinkTopologyContext} for a given Storm topology. The context object is instantiated
* for each parallel task
*/
FlinkTopologyContext(final StormTopology topology,
@SuppressWarnings("rawtypes") final Map stormConf,
final Map<Integer, String> taskToComponent, final Map<String, List<Integer>> componentToSortedTasks,
final Map<String, Map<String, Fields>> componentToStreamToFields, final String stormId, final String codeDir,
final String pidDir, final Integer taskId, final Integer workerPort, final List<Integer> workerTasks,
final Map<String, Object> defaultResources, final Map<String, Object> userResources,
final Map<String, Object> executorData, @SuppressWarnings("rawtypes") final Map registeredMetrics,
final Atom openOrPrepareWasCalled) {
super(topology, stormConf, taskToComponent, componentToSortedTasks, componentToStreamToFields, stormId,
codeDir, pidDir, taskId, workerPort, workerTasks, defaultResources, userResources, executorData,
registeredMetrics, openOrPrepareWasCalled);
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public void addTaskHook(final ITaskHook hook) {
throw new UnsupportedOperationException("Task hooks are not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public Collection<ITaskHook> getHooks() {
throw new UnsupportedOperationException("Task hooks are not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public IMetric getRegisteredMetricByName(final String name) {
throw new UnsupportedOperationException("Metrics are not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@SuppressWarnings("rawtypes")
@Override
public CombinedMetric registerMetric(final String name, final ICombiner combiner, final int timeBucketSizeInSecs) {
throw new UnsupportedOperationException("Metrics are not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@SuppressWarnings("rawtypes")
@Override
public ReducedMetric registerMetric(final String name, final IReducer combiner, final int timeBucketSizeInSecs) {
throw new UnsupportedOperationException("Metrics are not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public <T extends IMetric> T registerMetric(final String name, final T metric, final int timeBucketSizeInSecs) {
throw new UnsupportedOperationException("Metrics are not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public <T extends ISubscribedState> T setAllSubscribedState(final T obj) {
throw new UnsupportedOperationException("Not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public <T extends ISubscribedState> T setSubscribedState(final String componentId, final T obj) {
throw new UnsupportedOperationException("Not supported by Flink");
}
/**
* Not supported by Flink.
*
* @throws UnsupportedOperationException
* at every invocation
*/
@Override
public <T extends ISubscribedState> T setSubscribedState(final String componentId, final String streamId, final T
obj) {
throw new UnsupportedOperationException("Not supported by Flink");
}
}
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.storm.wrappers;
import org.apache.flink.api.java.tuple.Tuple0;
import org.apache.flink.api.java.tuple.Tuple1;
import org.apache.flink.api.java.tuple.Tuple25;
import org.apache.flink.streaming.runtime.streamrecord.StreamRecord;
import org.apache.storm.topology.IRichBolt;
import java.util.Collection;
import static java.util.Arrays.asList;
/**
* A {@link MergedInputsBoltWrapper} is a {@link BoltWrapper} that expects input tuples of type {@link StormTuple}. It
* can be used to wrap a multi-input bolt and assumes that all input stream got merged into a {@link StormTuple} stream
* already.
*/
public final class MergedInputsBoltWrapper<IN, OUT> extends BoltWrapper<StormTuple<IN>, OUT> {
private static final long serialVersionUID = 6399319187892878545L;
/**
* Instantiates a new {@link MergedInputsBoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it
* can be used within a Flink streaming program. The output type will be one of {@link Tuple0} to {@link Tuple25}
* depending on the bolt's declared number of attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @throws IllegalArgumentException
* If the number of declared output attributes is not with range [0;25].
*/
public MergedInputsBoltWrapper(final IRichBolt bolt) throws IllegalArgumentException {
super(bolt);
}
/**
* Instantiates a new {@link MergedInputsBoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it
* can be used within a Flink streaming program. The output type can be any type if parameter {@code rawOutput} is
* {@code true} and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the
* output type will be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of
* attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not within range
* [1;25].
*/
public MergedInputsBoltWrapper(final IRichBolt bolt, final String[] rawOutputs)
throws IllegalArgumentException {
super(bolt, asList(rawOutputs));
}
/**
* Instantiates a new {@link MergedInputsBoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it
* can be used within a Flink streaming program. The output type can be any type if parameter {@code rawOutput} is
* {@code true} and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the
* output type will be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of
* attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* [1;25].
*/
public MergedInputsBoltWrapper(final IRichBolt bolt, final Collection<String> rawOutputs)
throws IllegalArgumentException {
super(bolt, rawOutputs);
}
/**
* Instantiates a new {@link MergedInputsBoltWrapper} that wraps the given Storm {@link IRichBolt bolt} such that it
* can be used within a Flink streaming program. The output type can be any type if parameter {@code rawOutput} is
* {@code true} and the bolt's number of declared output tuples is 1. If {@code rawOutput} is {@code false} the
* output type will be one of {@link Tuple0} to {@link Tuple25} depending on the bolt's declared number of
* attributes.
*
* @param bolt
* The Storm {@link IRichBolt bolt} to be used.
* @param name
* The name of the bolt.
* @param rawOutputs
* Contains stream names if a single attribute output stream, should not be of type {@link Tuple1} but be
* of a raw type.
* @throws IllegalArgumentException
* If {@code rawOutput} is {@code true} and the number of declared output attributes is not 1 or if
* {@code rawOutput} is {@code false} and the number of declared output attributes is not with range
* [0;25].
*/
public MergedInputsBoltWrapper(final IRichBolt bolt, final String name, final Collection<String> rawOutputs)
throws IllegalArgumentException {
super(bolt, name, null, null, null, rawOutputs);
}
@Override
public void processElement(final StreamRecord<StormTuple<IN>> element) throws Exception {
this.flinkCollector.setTimestamp(element);
this.bolt.execute(element.getValue());
}
}
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册