diff --git a/README.md b/README.md index 865dc7f46a26499ea1a93ef453b9021b52938656..4698caee7ff2451ead473682ed751523a0c385c1 100755 --- a/README.md +++ b/README.md @@ -485,6 +485,13 @@ full cell -> mitochondrial intermembrane space ``` the definition for the mitochondrion is fully contained within the melanosome membrane definition and so testing that group should try and account for the mitochondrion. This can be done with the `HierarchicalEnrichment` routine exemplified above. We know that the melanosome membrane is associated with sight and that being diabetic is associated with mitochondrial dysfunction, but also that diabetic retinopathy affects diabetics and we also see that there is a knowledge based genetic connection relating these two spatially distinct regions of the cell. +# [Example 9](https://gist.githubusercontent.com/richardtjornhammar/e84056e0b10f8d550258a1e8944ee375/raw/45fb8322487ff3a384e7f56eb06ac1073aee4da1/example9.py): Impetuous [deterministic DBSCAN](https://github.com/richardtjornhammar/impetuous/blob/master/src/impetuous/clustering.py) (search for dbscan) + +[DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) is a clustering algorithm that can be seen as a way of rejecting points that are positioned in low dense regions of a point cloud. This introduces holes and may result in a larger segment, that would otherwise be connected via a non dense link to become disconnected and form two segments, or clusters, instead. The rejection criterion is simple. The central concern is to evaluate a distance matrix with an applied cutoff this turns the distance into a true or false value depending if the pair distance between point i and j are within the distance cutoff. This new binary Neighbour matrix tells you wether or not two points are neighbours. The DBSCAN criterion states that a point is not part of any cluster if it has fewer than `minPts` neighbors. Once you've calculated the distance matrix you can immediately evaluate the number of neighbors each point has, via . If the rejection vector R of a point is True then all the pairwise distances in the distance matrix of that point is set to value larger than epsilon. This ensures that distance matrix search will reject these as Neighbours of any other for the choose epsilon. By tracing out all points that are neighbors and assessing the [connectivity](https://github.com/richardtjornhammar/impetuous/blob/master/src/impetuous/clustering.py) you can find all the clusters. + +In this [example](https://gist.githubusercontent.com/richardtjornhammar/e84056e0b10f8d550258a1e8944ee375/raw/45fb8322487ff3a384e7f56eb06ac1073aee4da1/example9.py) we do exactly this for two gaussian point clouds. The dbscan search is just a single line `dbscan ( data_frame = point_cloud_df , eps=0.45 , minPts=4 )`, while the last 27 lines are just there to plot the [results](https://bl.ocks.org/richardtjornhammar/raw/0cc0ff037e88c76a9d65387155674fd1/?raw=true) with [revisions](graph revsion dates : https://gist.github.com/richardtjornhammar/0cc0ff037e88c76a9d65387155674fd1/revisions) + + # Notes These examples were meant as illustrations of some of the codes implemented in the impetuous-gfa package. The impetuous visualisation codes requires [Bokeh](https://docs.bokeh.org/en/latest/index.html) and are still being migrated to work with the latest Bokeh versions.