# Gelly: Flink Graph API Gelly is a Graph API for Flink. It contains a set of methods and utilities which aim to simplify the development of graph analysis applications in Flink. In Gelly, graphs can be transformed and modified using high-level functions similar to the ones provided by the batch processing API. Gelly provides methods to create, transform and modify graphs, as well as a library of graph algorithms. ## Using Gelly Gelly is currently part of the _libraries_ Maven project. All relevant classes are located in the _org.apache.flink.graph_ package. Add the following dependency to your `pom.xml` to use Gelly. ``` org.apache.flink flink-gelly_2.11 1.7.1 ``` ``` org.apache.flink flink-gelly-scala_2.11 1.7.1 ``` Note that Gelly is not part of the binary distribution. See [linking](//ci.apache.org/projects/flink/flink-docs-release-1.7/dev/linking.html) for instructions on packaging Gelly libraries into Flink user programs. The remaining sections provide a description of available methods and present several examples of how to use Gelly and how to mix it with the Flink DataSet API. ## Running Gelly Examples The Gelly library jars are provided in the [Flink distribution](https://flink.apache.org/downloads.html "Apache Flink: Downloads") in the **opt** directory (for versions older than Flink 1.2 these can be manually downloaded from [Maven Central](http://search.maven.org/#search|ga|1|flink%20gelly)). To run the Gelly examples the **flink-gelly** (for Java) or **flink-gelly-scala** (for Scala) jar must be copied to Flink’s **lib** directory. ``` cp opt/flink-gelly_*.jar lib/ cp opt/flink-gelly-scala_*.jar lib/ ``` Gelly’s examples jar includes drivers for each of the library methods and is provided in the **examples** directory. After configuring and starting the cluster, list the available algorithm classes: ``` ./bin/start-cluster.sh ./bin/flink run examples/gelly/flink-gelly-examples_*.jar ``` The Gelly drivers can generate graph data or read the edge list from a CSV file (each node in a cluster must have access to the input file). The algorithm description, available inputs and outputs, and configuration are displayed when an algorithm is selected. Print usage for [JaccardIndex](./library_methods.html#jaccard-index): ``` ./bin/flink run examples/gelly/flink-gelly-examples_*.jar --algorithm JaccardIndex ``` Display [graph metrics](./library_methods.html#metric) for a million vertex graph: ``` ./bin/flink run examples/gelly/flink-gelly-examples_*.jar \ --algorithm GraphMetrics --order directed \ --input RMatGraph --type integer --scale 20 --simplify directed \ --output print ``` The size of the graph is adjusted by the _--scale_ and _--edge_factor_ parameters. The [library generator](./graph_generators.html#rmat-graph) provides access to additional configuration to adjust the power-law skew and random noise. Sample social network data is provided by the [Stanford Network Analysis Project](http://snap.stanford.edu/data/index.html). The [com-lj](http://snap.stanford.edu/data/bigdata/communities/com-lj.ungraph.txt.gz) data set is a good starter size. Run a few algorithms and monitor the job progress in Flink’s Web UI: ``` wget -O - http://snap.stanford.edu/data/bigdata/communities/com-lj.ungraph.txt.gz | gunzip -c > com-lj.ungraph.txt ./bin/flink run -q examples/gelly/flink-gelly-examples_*.jar \ --algorithm GraphMetrics --order undirected \ --input CSV --type integer --simplify undirected --input_filename com-lj.ungraph.txt --input_field_delimiter /figure>\t' \ --output print ./bin/flink run -q examples/gelly/flink-gelly-examples_*.jar \ --algorithm ClusteringCoefficient --order undirected \ --input CSV --type integer --simplify undirected --input_filename com-lj.ungraph.txt --input_field_delimiter /figure>\t' \ --output hash ./bin/flink run -q examples/gelly/flink-gelly-examples_*.jar \ --algorithm JaccardIndex \ --input CSV --type integer --simplify undirected --input_filename com-lj.ungraph.txt --input_field_delimiter /figure>\t' \ --output hash ``` Please submit feature requests and report issues on the user [mailing list](https://flink.apache.org/community.html#mailing-lists) or [Flink Jira](https://issues.apache.org/jira/browse/FLINK). We welcome suggestions for new algorithms and features as well as [code contributions](https://flink.apache.org/contribute-code.html).