diff --git a/doc/development/README.md b/doc/development/README.md index 5a33c46c6203ec70c64a2b9dd91798ac02955e52..9547f9a285f91e5b77b56febab2323f42b60ac40 100644 --- a/doc/development/README.md +++ b/doc/development/README.md @@ -38,6 +38,7 @@ description: 'Learn how to contribute to GitLab.' - [Sidekiq guidelines](sidekiq_style_guide.md) for working with Sidekiq workers - [Working with Gitaly](gitaly.md) - [Manage feature flags](feature_flags.md) +- [Licensed feature availability](licensed_feature_availability.md) - [View sent emails or preview mailers](emails.md) - [Shell commands](shell_commands.md) in the GitLab codebase - [`Gemfile` guidelines](gemfile.md) @@ -48,6 +49,7 @@ description: 'Learn how to contribute to GitLab.' - [How to dump production data to staging](db_dump.md) - [Working with the GitHub importer](github_importer.md) - [Import/Export development documentation](import_export.md) +- [Elasticsearch integration docs](elasticsearch.md) - [Working with Merge Request diffs](diffs.md) - [Kubernetes integration guidelines](kubernetes.md) - [Permissions](permissions.md) @@ -55,6 +57,7 @@ description: 'Learn how to contribute to GitLab.' - [Guidelines for reusing abstractions](reusing_abstractions.md) - [DeclarativePolicy framework](policies.md) - [How Git object deduplication works in GitLab](git_object_deduplication.md) +- [Geo development](geo.md) ## Performance guides diff --git a/doc/development/contributing/merge_request_workflow.md b/doc/development/contributing/merge_request_workflow.md index 5e310092a6e714ce4458ac80f17d180b4236931a..8a4aa5dfa7f71ff4c034c547f80691998c98b65d 100644 --- a/doc/development/contributing/merge_request_workflow.md +++ b/doc/development/contributing/merge_request_workflow.md @@ -155,7 +155,7 @@ the contribution acceptance criteria below: restarting the failing CI job, rebasing from master to bring in updates that may resolve the failure, or if it has not been fixed yet, ask a developer to help you fix the test. -1. The MR initially contains a a few logically organized commits. +1. The MR initially contains a few logically organized commits. 1. The changes can merge without problems. If not, you should rebase if you're the only one working on your feature branch, otherwise merge `master`. 1. Only one specific issue is fixed or one specific feature is implemented. Do not diff --git a/doc/development/elasticsearch.md b/doc/development/elasticsearch.md new file mode 100644 index 0000000000000000000000000000000000000000..0c9e790871383dcffdfd7b83179998392a2749ce --- /dev/null +++ b/doc/development/elasticsearch.md @@ -0,0 +1,166 @@ +# Elasticsearch knowledge **[STARTER ONLY]** + +This area is to maintain a compendium of useful information when working with elasticsearch. + +Information on how to enable ElasticSearch and perform the initial indexing is kept in https://docs.gitlab.com/ee/integration/elasticsearch.html#enabling-elasticsearch + +## Initial installation on OS X + +It is recommended to use the Docker image. After installing docker you can immediately spin up an instance with + +``` +docker run --name elastic56 -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:5.6.12 +``` + +and use `docker stop elastic56` and `docker start elastic56` to stop/start it. + +### Installing on the host + +We currently only support Elasticsearch [5.6 to 6.x](https://docs.gitlab.com/ee/integration/elasticsearch.html#requirements) + +Version 5.6 is available on homebrew and is the recommended version to use in order to test compatibility. + +``` +brew install elasticsearch@5.6 +``` + +There is no need to install any plugins + +## New repo indexer (beta) + +If you're interested on working with the new beta repo indexer, all you need to do is: + +- git clone git@gitlab.com:gitlab-org/gitlab-elasticsearch-indexer.git +- make +- make install + +this adds `gitlab-elasticsearch-indexer` to `$GOPATH/bin`, please make sure that is in your `$PATH`. After that GitLab will find it and you'll be able to enable it in the admin settings area. + +**note:** `make` will not recompile the executable unless you do `make clean` beforehand + +## Helpful rake tasks + +- `gitlab:elastic:test:index_size`: Tells you how much space the current index is using, as well as how many documents are in the index. +- `gitlab:elastic:test:index_size_change`: Outputs index size, reindexes, and outputs index size again. Useful when testing improvements to indexing size. + +Additionally, if you need large repos or multiple forks for testing, please consider [following these instructions](https://docs.gitlab.com/ee/development/rake_tasks.html#extra-project-seed-options) + +## How does it work? + +The ElasticSearch integration depends on an external indexer. We ship a [ruby indexer](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/bin/elastic_repo_indexer) by default but are also working on an [indexer written in Go](https://gitlab.com/gitlab-org/gitlab-elasticsearch-indexer). The user must trigger the initial indexing via a rake task, but after this is done GitLab itself will trigger reindexing when required via `after_` callbacks on create, update, and destroy that are inherited from [/ee/app/models/concerns/elastic/application_search.rb](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/app/models/concerns/elastic/application_search.rb). + +All indexing after the initial one is done via `ElasticIndexerWorker` (sidekiq jobs). + +Search queries are generated by the concerns found in [ee/app/models/concerns/elastic](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/app/models/concerns/elastic). These concerns are also in charge of access control, and have been a historic source of security bugs so please pay close attention to them! + +## Existing Analyzers/Tokenizers/Filters +These are all defined in https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/elasticsearch/git/model.rb + +### Analyzers +#### `path_analyzer` +Used when indexing blobs' paths. Uses the `path_tokenizer` and the `lowercase` and `asciifolding` filters. + +Please see the `path_tokenizer` explanation below for an example. + +#### `sha_analyzer` +Used in blobs and commits. Uses the `sha_tokenizer` and the `lowercase` and `asciifolding` filters. + +Please see the `sha_tokenizer` explanation later below for an example. + +#### `code_analyzer` +Used when indexing a blob's filename and content. Uses the `whitespace` tokenizer and the filters: `code`, `edgeNGram_filter`, `lowercase`, and `asciifolding` + +The `whitespace` tokenizer was selected in order to have more control over how tokens are split. For example the string `Foo::bar(4)` needs to generate tokens like `Foo` and `bar(4)` in order to be properly searched. + +Please see the `code` filter for an explanation on how tokens are split. + +#### `code_search_analyzer` +Not directly used for indexing, but rather used to transform a search input. Uses the `whitespace` tokenizer and the `lowercase` and `asciifolding` filters. + +### Tokenizers +#### `sha_tokenizer` +This is a custom tokenizer that uses the [`edgeNGram` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenizer.html) to allow SHAs to be searcheable by any sub-set of it (minimum of 5 chars). + +example: + +`240c29dc7e` becomes: +- `240c2` +- `240c29` +- `240c29d` +- `240c29dc` +- `240c29dc7` +- `240c29dc7e` + +#### `path_tokenizer` +This is a custom tokenizer that uses the [`path_hierarchy` tokenizer](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pathhierarchy-tokenizer.html) with `reverse: true` in order to allow searches to find paths no matter how much or how little of the path is given as input. + +example: + +`'/some/path/application.js'` becomes: +- `'/some/path/application.js'` +- `'some/path/application.js'` +- `'path/application.js'` +- `'application.js'` + +### Filters +#### `code` +Uses a [Pattern Capture token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-pattern-capture-tokenfilter.html) to split tokens into more easily searched versions of themselves. + +Patterns: +- `"(\\p{Ll}+|\\p{Lu}\\p{Ll}+|\\p{Lu}+)"`: captures CamelCased and lowedCameCased strings as separate tokens +- `"(\\d+)"`: extracts digits +- `"(?=([\\p{Lu}]+[\\p{L}]+))"`: captures CamelCased strings recursively. Ex: `ThisIsATest` => `[ThisIsATest, IsATest, ATest, Test]` +- `'"((?:\\"|[^"]|\\")*)"'`: captures terms inside quotes, removing the quotes +- `"'((?:\\'|[^']|\\')*)'"`: same as above, for single-quotes +- `'\.([^.]+)(?=\.|\s|\Z)'`: separate terms with periods in-between +- `'\/?([^\/]+)(?=\/|\b)'`: separate path terms `like/this/one` + +#### `edgeNGram_filter` +Uses an [Edge NGram token filter](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/analysis-edgengram-tokenfilter.html) to allow inputs with only parts of a token to find the token. For example it would turn `glasses` into permutations starting with `gl` and ending with `glasses`, which would allow a search for "`glass`" to find the original token `glasses` + +## Gotchas + +- Searches can have their own analyzers. Remember to check when editing analyzers +- `Character` filters (as opposed to token filters) always replace the original character, so they're not a good choice as they can hinder exact searches + +## Troubleshooting + +### Getting "flood stage disk watermark [95%] exceeded" + +You might get an error such as + +``` +[2018-10-31T15:54:19,762][WARN ][o.e.c.r.a.DiskThresholdMonitor] [pval5Ct] + flood stage disk watermark [95%] exceeded on + [pval5Ct7SieH90t5MykM5w][pval5Ct][/usr/local/var/lib/elasticsearch/nodes/0] free: 56.2gb[3%], + all indices on this node will be marked read-only +``` + +This is because you've exceeded the disk space threshold - it thinks you don't have enough disk space left, based on the default 95% threshold. + +In addition, the `read_only_allow_delete` setting will be set to `true`. It will block indexing, `forcemerge`, etc + +``` +curl "http://localhost:9200/gitlab-development/_settings?pretty" +``` + +Add this to your `elasticsearch.yml` file: + +``` +# turn off the disk allocator +cluster.routing.allocation.disk.threshold_enabled: false +``` + +_or_ + +``` +# set your own limits +cluster.routing.allocation.disk.threshold_enabled: true +cluster.routing.allocation.disk.watermark.flood_stage: 5gb # ES 6.x only +cluster.routing.allocation.disk.watermark.low: 15gb +cluster.routing.allocation.disk.watermark.high: 10gb +``` + +Restart ElasticSearch, and the `read_only_allow_delete` will clear on it's own. + +_from "Disk-based Shard Allocation | Elasticsearch Reference" [5.6](https://www.elastic.co/guide/en/elasticsearch/reference/5.6/disk-allocator.html#disk-allocator) and [6.x](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/disk-allocator.html)_ diff --git a/doc/development/fe_guide/style_guide_scss.md b/doc/development/fe_guide/style_guide_scss.md index 548d72bea93ec1d2d56820bc613b3aa08b5824ad..36880dd746df2c81aa140c58d14552036baaf2ef 100644 --- a/doc/development/fe_guide/style_guide_scss.md +++ b/doc/development/fe_guide/style_guide_scss.md @@ -16,10 +16,12 @@ New utility classes should be added to [`utilities.scss`](https://gitlab.com/git **Background color**: `.bg-variant-shade` e.g. `.bg-warning-400` **Text color**: `.text-variant-shade` e.g. `.text-success-500` + - variant is one of 'primary', 'secondary', 'success', 'warning', 'error' - shade is on of the shades listed on [colors](https://design.gitlab.com/foundations/colors/) **Font size**: `.text-size` e.g. `.text-2` + - **size** is number from 1-6 from our [Type scale](https://design.gitlab.com/foundations/typography) ### Naming diff --git a/doc/development/geo.md b/doc/development/geo.md new file mode 100644 index 0000000000000000000000000000000000000000..d8669d377b0b87d4c2a753be1d01d76c98b28496 --- /dev/null +++ b/doc/development/geo.md @@ -0,0 +1,417 @@ +# Geo (development) **[PREMIUM ONLY]** + +Geo connects GitLab instances together. One GitLab instance is +designated as a **primary** node and can be run with multiple +**secondary** nodes. Geo orchestrates quite a few components that are +described in more detail below. + +## Database replication + +Geo uses [streaming replication](#streaming-replication) to replicate +the database from the **primary** to the **secondary** nodes. This +replication gives the **secondary** nodes access to all the data saved +in the database. So users can log in on the **secondary** and read all +the issues, merge requests, etc. on the **secondary** node. + +## Repository replication + +Geo also replicates repositories. Each **secondary** node keeps track of +the state of every repository in the [tracking database](#tracking-database). + +There are a few ways a repository gets replicated by the: + +- [Repository Sync worker](#repository-sync-worker). +- [Geo Log Cursor](#geo-log-cursor). + +### Project Registry + +The `Geo::ProjectRegistry` class defines the model used to track the +state of repository replication. For each project in the main +database, one record in the tracking database is kept. + +It records the following about repositories: + +- The last time they were synced. +- The last time they were synced successfully. +- If they need to be resynced. +- When retry should be attempted. +- The number of retries. +- If and when the they were verified. + +It also stores these attributes for project wikis in dedicated columns. + +### Repository Sync worker + +The `Geo::RepositorySyncWorker` class runs periodically in the +background and it searches the `Geo::ProjectRegistry` model for +projects that need updating. Those projects can be: + +- Unsynced: Projects that have never been synced on the **secondary** + node and so do not exist yet. +- Updated recently: Projects that have a `last_repository_updated_at` + timestamp that is more recent than the `last_repository_successful_sync_at` + timestamp in the `Geo::ProjectRegistry` model. +- Manual: The admin can manually flag a repository to resync in the + [Geo admin panel](https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html). + +When we fail to fetch a repository on the secondary `RETRIES_BEFORE_REDOWNLOAD` +times, Geo does a so-called _redownload_. It will do a clean clone +into the `@geo-temporary` directory in the root of the storage. When +it's successful, we replace the main repo with the newly cloned one. + +### Geo Log Cursor + +The [Geo Log Cursor](#geo-log-cursor) is a separate process running on +each **secondary** node. It monitors the [Geo Event Log](#geo-event-log) +and handles all of the events. When it sees an unhandled event, it +starts a background worker to handle that event, depending on the type +of event. + +When a repository receives an update, the Geo **primary** node creates +a Geo event with an associated repository updated event. The cursor +picks that up, and schedules a `Geo::ProjectSyncWorker` job which will +use the `Geo::RepositorySyncService` class and `Geo::WikiSyncService` +class to update the repository and the wiki. + +## Uploads replication + +File uploads are also being replicated to the **secondary** node. To +track the state of syncing, the `Geo::FileRegistry` model is used. + +### File Registry + +Similar to the [Project Registry](#project-registry), there is a +`Geo::FileRegistry` model that tracks the synced uploads. + +CI Job Artifacts are synced in a similar way as uploads or LFS +objects, but they are tracked by `Geo::JobArtifactRegistry` model. + +### File Download Dispatch worker + +Also similar to the [Repository Sync worker](#repository-sync-worker), +there is a `Geo::FileDownloadDispatchWorker` class that is run +periodically to sync all uploads that aren't synced to the Geo +**secondary** node yet. + +Files are copied via HTTP(s) and initiated via the +`/api/v4/geo/transfers/:type/:id` endpoint, +e.g. `/api/v4/geo/transfers/lfs/123`. + +## Authentication + +To authenticate file transfers, each `GeoNode` record has two fields: + +- A public access key (`access_key` field). +- A secret access key (`secret_access_key` field). + +The **secondary** node authenticates itself via a [JWT request](https://jwt.io/). +When the **secondary** node wishes to download a file, it sends an +HTTP request with the `Authorization` header: + +``` +Authorization: GL-Geo : +``` + +The **primary** node uses the `access_key` field to look up the +corresponding Geo **secondary** node and decrypts the JWT payload, +which contains additional information to identify the file +request. This ensures that the **secondary** node downloads the right +file for the right database ID. For example, for an LFS object, the +request must also include the SHA256 sum of the file. An example JWT +payload looks like: + +``` +{ "data": { sha256: "31806bb23580caab78040f8c45d329f5016b0115" }, iat: "1234567890" } +``` + +If the requested file matches the requested SHA256 sum, then the Geo +**primary** node sends data via the [X-Sendfile](https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/) +feature, which allows NGINX to handle the file transfer without tying +up Rails or Workhorse. + +NOTE: **Note:** +JWT requires synchronized clocks between the machines +involved, otherwise it may fail with an encryption error. + +## Using the Tracking Database + +Along with the main database that is replicated, a Geo **secondary** +node has its own separate [Tracking database](#tracking-database). + +The tracking database contains the state of the **secondary** node. + +Any database migration that needs to be run as part of an upgrade +needs to be applied to the tracking database on each **secondary** node. + +### Configuration + +The database configuration is set in [`config/database_geo.yml`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/config/database_geo.yml.postgresql). +The directory [`ee/db/geo`](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/ee/db/geo) +contains the schema and migrations for this database. + +To write a migration for the database, use the `GeoMigrationGenerator`: + +``` +rails g geo_migration [args] [options] +``` + +To migrate the tracking database, run: + +``` +bundle exec rake geo:db:migrate +``` + +### Foreign Data Wrapper + +The use of [FDW](#fdw) was introduced in GitLab 10.1. + +This is useful for the [Geo Log Cursor](#geo-log-cursor) and improves +the performance of some synchronization operations. + +While FDW is available in older versions of PostgreSQL, we needed to +raise the minimum required version to 9.6 as this includes many +performance improvements to the FDW implementation. + +#### Refeshing the Foreign Tables + +Whenever the database schema changes on the **primary** node, the +**secondary** node will need to refresh its foreign tables by running +the following: + +```sh +bundle exec rake geo:db:refresh_foreign_tables +``` + +Failure to do this will prevent the **secondary** node from +functioning properly. The **secondary** node will generate error +messages, as the following PostgreSQL error: + +``` +ERROR: relation "gitlab_secondary.ci_job_artifacts" does not exist at character 323 +STATEMENT: SELECT a.attname, format_type(a.atttypid, a.atttypmod), + pg_get_expr(d.adbin, d.adrelid), a.attnotnull, a.atttypid, a.atttypmod + FROM pg_attribute a LEFT JOIN pg_attrdef d + ON a.attrelid = d.adrelid AND a.attnum = d.adnum + WHERE a.attrelid = '"gitlab_secondary"."ci_job_artifacts"'::regclass + AND a.attnum > 0 AND NOT a.attisdropped + ORDER BY a.attnum +``` + +## Finders + +Geo uses [Finders](https://gitlab.com/gitlab-org/gitlab-ee/tree/master/app/finders), +which are classes take care of the heavy lifting of looking up +projects/attachments/etc. in the tracking database and main database. + +### Finders Performance + +The Finders need to compare data from the main database with data in +the tracking database. For example, counting the number of synced +projects normally involves retrieving the project IDs from one +database and checking their state in the other database. This is slow +and requires a lot of memory. + +To overcome this, the Finders use [FDW](#fdw), or Foreign Data +Wrappers. This allows a regular `JOIN` between the main database and +the tracking database. + +## Redis + +Redis on the **secondary** node works the same as on the **primary** +node. It is used for caching, storing sessions, and other persistent +data. + +Redis data replication between **primary** and **secondary** node is +not used, so sessions etc. aren't shared between nodes. + +## Object Storage + +GitLab can optionally use Object Storage to store data it would +otherwise store on disk. These things can be: + + - LFS Objects + - CI Job Artifacts + - Uploads + +Objects that are stored in object storage, are not handled by Geo. Geo +ignores items in object storage. Either: + +- The object storage layer should take care of its own geographical + replication. +- All secondary nodes should use the same storage node. + +## Verification + +### Repository verification + +Repositories are verified with a checksum. + +The **primary** node calculates a checksum on the repository. It +basically hashes all Git refs together and stores that hash in the +`project_repository_states` table of the database. + +The **secondary** node does the same to calculate the hash of its +clone, and compares the hash with the value the **primary** node +calculated. If there is a mismatch, Geo will mark this as a mismatch +and the administrator can see this in the [Geo admin panel](https://docs.gitlab.com/ee/user/admin_area/geo_nodes.html). + +## Glossary + +### Primary node + +A **primary** node is the single node in a Geo setup that read-write +capabilities. It's the single source of truth and the Geo +**secondary** nodes replicate their data from there. + +In a Geo setup, there can only be one **primary** node. All +**secondary** nodes connect to that **primary**. + +### Secondary node + +A **secondary** node is a read-only replica of the **primary** node +running in a different geographical location. + +### Streaming replication + +Geo depends on the streaming replication feature of PostgreSQL. It +completely replicates the database data and the database schema. The +database replica is a read-only copy. + +Streaming replication depends on the Write Ahead Logs, or WAL. Those +logs are copied over to the replica and replayed there. + +Since streaming replication also replicates the schema, the database +migration do not need to run on the secondary nodes. + +### Tracking database + +A database on each Geo **secondary** node that keeps state for the node +on which it resides. Read more in [Using the Tracking database](#using-the-tracking-database). + +### FDW + +Foreign Data Wrapper, or FDW, is a feature built-in in PostgreSQL. It +allows data to be queried from different data sources. In Geo, it's +used to query data from different PostgreSQL instances. + +## Geo Event Log + +The Geo **primary** stores events in the `geo_event_log` table. Each +entry in the log contains a specific type of event. These type of +events include: + + - Repository Deleted event + - Repository Renamed event + - Repositories Changed event + - Repository Created event + - Hashed Storage Migrated event + - Lfs Object Deleted event + - Hashed Storage Attachments event + - Job Artifact Deleted event + - Upload Deleted event + +### Geo Log Cursor + +The process running on the **secondary** node that looks for new +`Geo::EventLog` rows. + +## Code features + +### `Gitlab::Geo` utilities + +Small utility methods related to Geo go into the +[`ee/lib/gitlab/geo.rb`](https://gitlab.com/gitlab-org/gitlab-ee/blob/master/ee/lib/gitlab/geo.rb) +file. + +Many of these methods are cached using the `RequestStore` class, to +reduce the performance impact of using the methods throughout the +codebase. + +#### Current node + +The class method `.current_node` returns the `GeoNode` record for the +current node. + +We use the `host`, `port`, and `relative_url_root` values from +`gitlab.yml` and search in the database to identify which node we are +in (see `GeoNode.current_node`). + +#### Primary or secondary + +To determine whether the current node is a **primary** node or a +**secondary** node use the `.primary?` and `.secondary?` class +methods. + +It is possible for these methods to both return `false` on a node when +the node is not enabled. See [Enablement](#enablement). + +#### Geo Database configured? + +There is also an additional gotcha when dealing with things that +happen during initialization time. In a few places, we use the +`Gitlab::Geo.geo_database_configured?` method to check if the node has +the tracking database, which only exists on the **secondary** +node. This overcomes race conditions that could happen during +bootstrapping of a new node. + +#### Enablement + +We consider Geo feature enabled when the user has a valid license with the +feature included, and they have at least one node defined at the Geo Nodes +screen. + +See `Gitlab::Geo.enabled?` and `Gitlab::Geo.license_allows?` methods. + +#### Read-only + +All Geo **secondary** nodes are read-only. + +The general principle of a [read-only database](verifying_database_capabilities.md#read-only-database) +applies to all Geo **secondary** nodes. So the +`Gitlab::Database.read_only?` method will always return `true` on a +**secondary** node. + +When some write actions are not allowed because the node is a +**secondary**, consider adding the `Gitlab::Database.read_only?` or +`Gitlab::Database.read_write?` guard, instead of `Gitlab::Geo.secondary?`. + +The database itself will already be read-only in a replicated setup, +so we don't need to take any extra step for that. + +## History of communication channel + +The communication channel has changed since first iteration, you can +check here historic decisions and why we moved to new implementations. + +### Custom code (GitLab 8.6 and earlier) + +In GitLab versions before 8.6, custom code is used to handle +notification from **primary** node to **secondary** nodes by HTTP +requests. + +### System hooks (GitLab 8.7 to 9.5) + +Later, it was decided to move away from custom code and begin using +system hooks. More people were using them, so +many would benefit from improvements made to this communication layer. + +There is a specific **internal** endpoint in our API code (Grape), +that receives all requests from this System Hooks: +`/api/v4/geo/receive_events`. + +We switch and filter from each event by the `event_name` field. + +### Geo Log Cursor (GitLab 10.0 and up) + +Since GitLab 10.0, [System Webhooks](#system-hooks-gitlab-87-to-95) are no longer +used and Geo Log Cursor is used instead. The Log Cursor traverses the +`Geo::EventLog` rows to see if there are changes since the last time +the log was checked and will handle repository updates, deletes, +changes, and renames. + +The table is within the replicated database. This has two advantages over the +old method: + +- Replication is synchronous and we preserve the order of events. +- Replication of the events happen at the same time as the changes in the + database. diff --git a/doc/development/go_guide/index.md b/doc/development/go_guide/index.md index 6dcade3bb516195b795ed80996ad07a318b84250..b9dc3797e5b70aad54aa3fe8ec0a1af65b128020 100644 --- a/doc/development/go_guide/index.md +++ b/doc/development/go_guide/index.md @@ -93,7 +93,7 @@ become available, you will be able to share job templates like this Dependencies should be kept to the minimum. The introduction of a new dependency should be argued in the merge request, as per our [Approval -Guidelines](../code_review.html#approval-guidelines). Both [License +Guidelines](../code_review.md#approval-guidelines). Both [License Management](https://docs.gitlab.com/ee/user/project/merge_requests/license_management.html) **[ULTIMATE]** and [Dependency Scanning](https://docs.gitlab.com/ee/user/project/merge_requests/dependency_scanning.html) diff --git a/doc/development/licensed_feature_availability.md b/doc/development/licensed_feature_availability.md new file mode 100644 index 0000000000000000000000000000000000000000..1657d73e0c938e3fe10f8c58b814cc5c5221e730 --- /dev/null +++ b/doc/development/licensed_feature_availability.md @@ -0,0 +1,37 @@ +# Licensed feature availability **[STARTER]** + +As of GitLab 9.4, we've been supporting a simplified version of licensed +feature availability checks via `ee/app/models/license.rb`, both for +on-premise or GitLab.com plans and features. + +## Restricting features scoped by namespaces or projects + +GitLab.com plans are persisted on user groups and namespaces, therefore, if you're adding a +feature such as [Related issues](https://docs.gitlab.com/ee/user/project/issues/related_issues.html) or +[Service desk](https://docs.gitlab.com/ee/user/project/service_desk.html), +it should be restricted on namespace scope. + +1. Add the feature symbol on `EES_FEATURES`, `EEP_FEATURES` or `EEU_FEATURES` constants in + `ee/app/models/license.rb`. Note on `ee/app/models/ee/namespace.rb` that _Bronze_ GitLab.com + features maps to on-premise _EES_, _Silver_ to _EEP_ and _Gold_ to _EEU_. +2. Check using: + +```ruby +project.feature_available?(:feature_symbol) +``` + +## Restricting global features (instance) + +However, for features such as [Geo](https://docs.gitlab.com/ee/administration/geo/replication/index.html) and +[Load balancing](https://docs.gitlab.com/ee/administration/database_load_balancing.html), which cannot be restricted +to only a subset of projects or namespaces, the check will be made directly in +the instance license. + +1. Add the feature symbol on `EES_FEATURES`, `EEP_FEATURES` or `EEU_FEATURES` constants in + `ee/app/models/license.rb`. +2. Add the same feature symbol to `GLOBAL_FEATURES` +3. Check using: + +```ruby +License.feature_available?(:feature_symbol) +``` diff --git a/doc/development/packages.md b/doc/development/packages.md new file mode 100644 index 0000000000000000000000000000000000000000..a3b891d783412e37eafebf7a7884d77f50285f41 --- /dev/null +++ b/doc/development/packages.md @@ -0,0 +1,68 @@ +# Packages **[PREMIUM]** + +This document will guide you through adding another [package management system](https://docs.gitlab.com/ee/administration/packages.html) support to GitLab. + +See already supported package types in [Packages documentation](https://docs.gitlab.com/ee/administration/packages.html) + +Since GitLab packages' UI is pretty generic, it is possible to add new +package system support by solely backend changes. This guide is superficial and does +not cover the way the code should be written. However, you can find a good example +by looking at existing merge requests with Maven and NPM support: + +- [NPM registry support](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8673). +- [Maven repository](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/6607). +- [Instance level endpoint for Maven repository](https://gitlab.com/gitlab-org/gitlab-ee/merge_requests/8757) + +## General information + +The existing database model requires the following: + +- Every package belongs to a project. +- Every package file belongs to a package. +- A package can have one or more package files. +- The package model is based on storing information about the package and its version. + +## API endpoints + +Package systems work with GitLab via API. For example `ee/lib/api/npm_packages.rb` +implements API endpoints to work with NPM clients. So, the first thing to do is to +add a new `ee/lib/api/your_name_packages.rb` file with API endpoints that are +necessary to make the package system client to work. Usually that means having +endpoints like: + +- GET package information. +- GET package file content. +- PUT upload package. + +Since the packages belong to a project, it's expected to have project-level endpoint +for uploading and downloading them. For example: + +``` +GET https://gitlab.com/api/v4/projects//packages/npm/ +PUT https://gitlab.com/api/v4/projects//packages/npm/ +``` + +Group-level and instance-level endpoints are good to have but are optional. + +NOTE: **Note:** +To avoid name conflict for instance-level endpoints we use +[the package naming convention](https://docs.gitlab.com/ee/user/project/packages/npm_registry.html#package-naming-convention) + +## Configuration + +GitLab has a `packages` section in its configuration file (`gitlab.rb`). +It applies to all package systems supported by GitLab. Usually you don't need +to add anything there. + +Packages can be configured to use object storage, therefore your code must support it. + +## Database + +The current database model allows you to store a name and a version for each package. +Every time you upload a new package, you can either create a new record of `Package` +or add files to existing record. `PackageFile` should be able to store all file-related +information like the file `name`, `side`, `sha1`, etc. + +If there is specific data necessary to be stored for only one package system support, +consider creating a separate metadata model. See `packages_maven_metadata` table +and `Packages::MavenMetadatum` model as example for package specific data. diff --git a/doc/development/rake_tasks.md b/doc/development/rake_tasks.md index 1ae69127295504094393badb890d5a4745e99578..27fc3231218bb06ad8d5ffcafdca6a7fa340f10d 100644 --- a/doc/development/rake_tasks.md +++ b/doc/development/rake_tasks.md @@ -28,6 +28,24 @@ bin/rake "gitlab:seed:issues[group-path/project-path]" By default, this seeds an average of 2 issues per week for the last 5 weeks per project. +#### Seeding issues for Insights charts **[ULTIMATE]** + +You can seed issues specifically for working with the +[Insights charts](https://docs.gitlab.com/ee/user/group/insights/index.html) with the +`gitlab:seed:insights:issues` task: + +```shell +# All projects +bin/rake gitlab:seed:insights:issues + +# A specific project +bin/rake "gitlab:seed:insights:issues[group-path/project-path]" +``` + +By default, this seeds an average of 10 issues per week for the last 52 weeks +per project. All issues will also be randomly labeled with team, type, severity, +and priority. + ### Automation If you're very sure that you want to **wipe the current database** and refill