提交 c4894b3a 编写于 作者: P popcor255 提交者: tekton-robot

refactor sync script

As a developer, it is difficult to make changes to a script and ensure that nothing is broken without tests. Test are added in this patch with 80% code coverage. The script has also been refactored to increase readability. This refactor is not a pure refactor. The sync did not work properly. There is a list of bugs that reported the script generating broken links, #160, #126, #158, #133, #134. This refactor also includes a patch that fixes these bugs. The patch redirects a user to github if there is a relative link to a file that does not exist. This is important because some links will reference a snippet of code, test, or example. These artifacts are not docs, they are not rendered properly on the site. It is better to redirect the user to the intended artifact if the file is not synced. This prevents random 404s, and preserves the original behavior of the sync script. There are other approaches to fix this bug. This includes, manually editing links, downloading the files that the broken link are referencing in the sync config file, or removing the broken links. This is the best approach because broken links are automatically fixed and do not require manual intervention. This approach causes a side effect of not requiring the config file to be up-to date because the script will redirect the user to github. We want to encourage frequent sync. So eventually, this script should become tekton-ified (tekton task) and placed into the tekton catalog repo. Sync needs to become a tekon task in order for every repo to sync docs on push event.
上级 da6a8a8b
......@@ -3,5 +3,7 @@ public/
package-lock.json
node_modules/
assets/js/version-switcher.js
content/en/docs/Pipelines/
content/en/docs/Triggers/
content/en/docs/
.coverage
.vscode
tkn_web_env
\ No newline at end of file
sync/runtime.txt
\ No newline at end of file
3.7
\ No newline at end of file
# sync
# Sync
This directory includes a helper script for synchronizing contents
from specified Tekton repositories to this repository.
......@@ -6,9 +6,13 @@ from specified Tekton repositories to this repository.
To run this script locally, set up a Python 3 environment with appropriate
Google Cloud Platform credentials, and execute the following command:
**Note:** This is a [link](../DEVELOPMENT.md) for the steps to run the entire website locally.
```bash
python3 -m venv tkn_web_env
source tkn_web_env/bin/activate
pip3 install -r requirements.txt
python3 sync.py
python3 sync/sync.py
```
## Usage
......@@ -23,3 +27,71 @@ sync.py:
Try --helpfull to get a list of all flags.
```
## Configuring Directories
The config directory should include the configuration for syncing/curating contents from
specific Tekton repositories.
See `pipelines.yaml` and `triggers.yaml` for more instructions. These two
YAML files control the synchronization/curation from the `tektoncd/pipeline`
and `tektoncd/triggers` repositories respectively.
The YAML files here are used by the scripts in `../sync`.
The yaml sync file requires the following schema
```yaml
# Each YAML file under sync/ configures how helper/helper.py synchronizes
# contents of various versions from its source of truth (usually a GitHub
# repository of a Tekton component, such as tektoncd/pipelines) to
# content/ (for the lastest version) and vault/ (for earlier versions).
# The name of the component.
# sync.py will use this value to build directories in content/ and vault/. This is used to for the list on the redenred web website.
component: Foobar
# The order of the component.
displayOrder: 0
# The GitHub repository where documentation resides.
repository: https://github.com/tektoncd/foobar
# The directory in the GitHub repository where contents reside.
docDirectory: docs
# The tags (versions) of contents to sync.
# Note that sync.py and related script reads tags in the order specified in
# the following list; the first entry in tags will automatically become the
# latest version of contents.
tags:
# The name of the tag in the GitHub repository.
- name: master
# The name to display on tekton.dev.
# sync.py will use this value in the version switcher and other places.
displayName: master
# Key-value pairs of files to sync, where the key is the original filename
# and the value is the new filename.
files:
- foo.md : bar.md
# To add a new version, append to the list as below
#- name: v0.8.2
# displayName: v0.8.x
# files:
# - myfiles.md: myfiles.md
# The link to the GitHub tag page.
archive: https://github.com/tektoncd/foobar/tags
```
## Mental Model
This is a quick diagram that will help you develop a mental model on how the sync works.
![logical flow of the sync program](../static/images/mental_model.png)
## Running with Docker
To build the docker file
**Note: If you trying running the container without supplying a config directory it will fail. Only copy specific values instead of the entire directory. We're primarily trying to avoid pulling in config/, since it's confusing thatit will not be used.**
```bash
# You must cd into the correct directory to build the image
docker build -t tekton/web sync/.
```
\ No newline at end of file
......@@ -3,4 +3,9 @@ PyYAML==5.1.2
google-cloud-storage==1.23.0
Jinja2==2.11.1
google-auth==1.14.0
absl-py==0.9.0
\ No newline at end of file
absl-py==0.9.0
urlopen==1.0.0
markdown==3.1.1
lxml==4.5.2
coverage==5.3
flake8==3.8.3
\ No newline at end of file
3.7
\ No newline at end of file
......@@ -7,21 +7,31 @@ import fileinput
import os
import re
import shutil
import markdown
import os.path
import wget
import logging
import yaml
from urllib.request import urlopen
from urllib.request import HTTPError
from urllib.request import URLError
from lxml import etree
from absl import app
from absl import flags
from jinja2 import Environment
from jinja2 import FileSystemLoader
import wget
from yaml import load
from yaml import Loader
FLAGS = flags.FLAGS
# Flag names are globally defined! So in general, we need to be
# careful to pick names that are unlikely to be used by other libraries.
# If there is a conflict, we'll get an error at import time.
flags.DEFINE_string('config', os.path.dirname(os.path.abspath(__file__)) + '/config', 'Config directory', short_name='c')
flags.DEFINE_string(
'config',
os.path.dirname(os.path.abspath(__file__)) + '/config',
'Config directory', short_name='c')
CONTENT_DIR = './content/en/docs'
JS_ASSET_DIR = './assets/js'
......@@ -32,20 +42,101 @@ BUCKET_NAME = 'tekton-website-assets'
GCP_NETLIFY_ENV_CRED = os.environ.get('GCP_CREDENTIAL_JSON')
GCP_PROJECT = os.environ.get('GCP_PROJECT')
RELATIVE_LINKS_RE = r'\[([^\]]*)\]\((?!.*://|/)([^)]*).md(#[^)]*)?\)'
LINKS_RE = r'\[([^\]]*)\]\((?!.*://|/)([^)]*).md(#[^)]*)?\)'
jinja_env = Environment(loader=FileSystemLoader(TEMPLATE_DIR))
def transform_links(link_prefix, dest_prefix, files):
for f in files:
def get_list_of_files(prefix, list_dic):
""""get all the values from the key-value pairs and
inserts them into a list with a prefix"""
files = []
for f in list_dic:
for k in f:
dest_path = f'{dest_prefix}/{f[k]}'
for line in fileinput.input(dest_path, inplace=1):
line = re.sub(RELATIVE_LINKS_RE, r'[\1](' + link_prefix + r'\2\3)', line.rstrip())
print(line)
files.append(f'{prefix}/{f[k]}')
return files
def transform_text(link_prefix, dest_prefix, files, url):
""" change every link to point to a valid relative file or absolute url """
logging.info(f'Running: transforming files in {dest_prefix}')
payload = (url, link_prefix)
list_of_files = get_list_of_files(dest_prefix, files)
set_lines(list_of_files, payload, transform_links)
logging.info(f'Completed: transformed files in {dest_prefix}')
def transform_links(line, url, link_prefix):
line, is_transformed = sanitize_text(link_prefix, line)
links = get_links(line)
if is_transformed:
for link in links:
link = link.get("href")
if not(os.path.isfile(link) or is_url(link) or is_ref(link)):
line = line.replace(link, github_link(url, link))
print(line)
def set_lines(files, payload, callback):
""" get all the text from the files and replace
each line of text with the list lines """
for line in fileinput.input(files=(files), inplace=1):
# add a line of text to the payload
# Callback function will mutate text and set the lines provided
callback(line, *payload)
def github_link(url, link):
""" given a github raw link convert it to the main github link """
return f'{url.replace("raw", "tree", 1)}/{link}'
def sanitize_text(link_prefix, text):
""" santize every line of text to exclude relative
links and to turn markdown file url's to html """
old_line = text.rstrip()
new_line = re.sub(LINKS_RE, r'[\1](' + link_prefix + r'\2\3)', old_line)
return (new_line, old_line == new_line)
def is_url(url):
""" check if it is a valid url """
try:
urlopen(url).read()
except (HTTPError, URLError):
return True
except ValueError:
return False
return True
def is_ref(url):
""" determine if the url is an a link """
if len(url) <= 0:
return False
return url[0] == "#"
def retrieve_files(url_prefix, dest_prefix, files):
def get_links(md):
""" return a list of all the links in a string formatted in markdown """
md = markdown.markdown(md)
try:
doc = etree.fromstring(md)
return doc.xpath('//a')
except etree.XMLSyntaxError:
pass
return []
def download_files(url_prefix, dest_prefix, files):
""" download the file and create the
correct folders that are neccessary """
if os.path.isdir(dest_prefix):
shutil.rmtree(dest_prefix)
os.mkdir(dest_prefix)
......@@ -53,92 +144,132 @@ def retrieve_files(url_prefix, dest_prefix, files):
for k in f:
src_url = f'{url_prefix}/{k}'
dest_path = f'{dest_prefix}/{f[k]}'
print(f'Downloading file (from {src_url} to {dest_path}).\n')
os.makedirs(os.path.dirname(dest_path), exist_ok=True)
wget.download(src_url, out=dest_path)
print('\n')
logging.info(f'Downloading {src_url} to {dest_path}...\n')
try:
wget.download(src_url, out=dest_path)
except (FileExistsError, URLError):
raise Exception(f'download failed for {src_url}')
logging.info('\n')
def verify_name_format(word):
pass
return True
def remove_ending_forward_slash(word):
""" remove the last character if it is backslash """
return word[:-1] if word.endswith('/') else word
def sync(sync_config):
component = sync_config['component']
repository = remove_ending_forward_slash(sync_config['repository'])
doc_directory = remove_ending_forward_slash(sync_config['docDirectory'])
tags = sync_config['tags']
def get_file_dirs(entry, index, source_dir, dest_dir):
""" return the files and there directories. Their relative and absolute
counterpart is needed to download the files properly to the website """
tag = entry['tags'][index]
repository = remove_ending_forward_slash(entry['repository'])
doc_directory = remove_ending_forward_slash(entry['docDirectory'])
host_dir = f'{repository}/raw/{tag["name"]}/{doc_directory}'
files = tag['files']
return (host_dir, source_dir, dest_dir, files)
def download_resources_to_project(yaml_list):
""" download the files based on a certain spec.
The YAML sync spec can be found in sync/config/README.md """
for entry in yaml_list:
component = entry['component']
for index, tag in enumerate(entry['tags']):
if index == 0:
# first links belongs on the home page
download_dir = f'/docs/{component}/'
site_dir = f'{CONTENT_DIR}/{component}'
else:
# the other links belong in the other versions a.k.a vault
download_dir = f'/vault/{component}-{tag["displayName"]}/'
site_dir = f'{VAULT_DIR}/{component}-{tag["displayName"]}'
dirs = get_file_dirs(entry, index, download_dir, site_dir)
host_dir, source_dir, dest_dir, files = dirs
download_files(host_dir, dest_dir, files)
transform_text(source_dir, dest_dir, files, host_dir)
def get_files(path, file_type):
""" return a list of all the files with the correct type """
file_list = []
# Get the latest version of contents
url_prefix = f'{repository}/raw/{tags[0]["name"]}/{doc_directory}'
dest_prefix = f'{CONTENT_DIR}/{component}'
files = tags[0]['files']
print(f'Retrieving the latest version ({tags[0]["displayName"]}) of Tekton {component} documentation (from {url_prefix} to {dest_prefix}).\n')
retrieve_files(url_prefix, dest_prefix, files)
transform_links(f'/docs/{component.lower()}/', dest_prefix, files)
# walk through every file in directory and its sub directories
for root, dirs, files in os.walk(path):
for file in files:
# append the file name to the list if is it the correct type
if file.endswith(file_type):
file_list.append(os.path.join(root, file))
# Get the previous versions of contents
for tag in tags[1:]:
url_prefix = f'{repository}/raw/{tag["name"]}/{doc_directory}'
dest_prefix = f'{VAULT_DIR}/{component}-{tag["displayName"]}'
files = tag['files']
print(f'Retrieving version {tag["displayName"]} of Tekton {component} documentation (from {url_prefix} to {dest_prefix}).\n')
retrieve_files(url_prefix, dest_prefix, files)
transform_links(f'/vault/{component.lower()}-{tag["displayName"]}/', dest_prefix, files)
return file_list
def get_component_versions(sync_configs):
def yaml_files_to_dic_list(files):
""" return a list of yaml files to a sorted
list based on a field called displayOrder """
dic_list = []
for file in files:
with open(file, 'r') as text:
# get the paths from the config file
dic_list.append(yaml.load(text, Loader=yaml.FullLoader))
dic_list.sort(key=lambda x: x['displayOrder'])
return dic_list
def get_tags(list):
""" return a list of tags with, there name, and displayName """
tags = []
for tag in list['tags']:
tags.append({'name': tag['name'], 'displayName': tag['displayName']})
return tags
def get_versions(sync_configs):
""" return the list of all the versions and there tag, name, archive """
component_versions = []
for sync_config in sync_configs:
component_versions.append({
'name': sync_config['component'],
'tags': [ {'name': tag['name'], 'displayName': tag['displayName']} for tag in sync_config['tags'] ],
'tags': get_tags(sync_config),
'archive': sync_config['archive']
})
return component_versions
def prepare_version_switcher_script(component_versions):
script_template = jinja_env.get_template('version-switcher.js.template')
script = script_template.render(component_versions_json=json.dumps(component_versions))
with open(f'{JS_ASSET_DIR}/version-switcher.js', 'w') as f:
f.write(script)
def prepare_vault_landing_page(component_versions):
md_template = jinja_env.get_template('_index.md.template')
md = md_template.render(component_versions=component_versions)
with open(f'{VAULT_DIR}/_index.md', 'w') as f:
f.write(md)
def scan(dir_path):
entries = os.scandir(dir_path)
sync_config_paths = []
for entry in entries:
if entry.name.endswith('.yaml'):
sync_config_paths.append(entry.path)
elif entry.is_dir():
scan(entry.path)
return sync_config_paths
def main(argv):
sync_config_paths = scan(f'{FLAGS.config}')
sync_configs = []
for sync_config_path in sync_config_paths:
with open(sync_config_path) as f:
sync_config = load(f, Loader=Loader)
sync_configs.append(sync_config)
sync(sync_config)
sync_configs.sort(key=lambda x: x['displayOrder'])
component_versions = get_component_versions(sync_configs)
prepare_version_switcher_script(component_versions)
prepare_vault_landing_page(component_versions)
def create_resource(dest_prefix, file, versions):
""" create site resource based on the version and file """
resource_template = jinja_env.get_template(f'{file}.template')
if ".js" in file:
serialize = json.dumps(versions)
resource = resource_template.render(component_versions_json=serialize)
elif ".md" in file:
resource = resource_template.render(component_versions=versions)
with open(f'{dest_prefix}/{file}', 'w') as f:
f.write(resource)
def sync(argv):
""" fetch all the files and sync it to the website """
# get the path of the urls needed
config_files = get_files(f'{FLAGS.config}', ".yaml")
config = yaml_files_to_dic_list(config_files)
# download resources
download_resources_to_project(config)
# create version switcher script
create_resource(JS_ASSET_DIR, "version-switcher.js", get_versions(config))
# create index for valut
create_resource(VAULT_DIR, "_index.md", get_versions(config))
if __name__ == '__main__':
app.run(main)
\ No newline at end of file
app.run(sync)
import unittest
import tempfile
import shutil
import ntpath
import os
from sync import get_links
from sync import transform_text
from sync import is_url
from sync import is_ref
from sync import remove_ending_forward_slash
from sync import get_tags
from sync import get_file_dirs
from sync import download_files
from sync import yaml_files_to_dic_list
from sync import get_files
from sync import get_list_of_files
class TestSync(unittest.TestCase):
# Utils
def path_leaf(self, path):
head, tail = ntpath.split(path)
return tail or ntpath.basename(head)
def read_and_delete_file(self, name):
file = open(name, "r")
text = file.read()
file.close()
os.remove(name)
return text
# Tests
def test_get_list_of_files(self):
""" get all the values from a list of dics and return a list """
expected = ["/prefix/f.tmp", "/prefix/t.xt"]
result = get_list_of_files("/prefix", [{"_": "f.tmp", "__": "t.xt"}])
self.assertEqual(result, expected)
def test_multiple_get_links(self):
""" This will ensure that get links will
return a list of multiple md links """
expected = ["www.link.com", "./link"]
result = get_links("this is a [link](www.link.com) and [link](./link)")
for index, link in enumerate(result):
self.assertEqual(link.get("href"), expected[index])
def test_is_ref(self):
""" Verify if a string is a reference. A reference is
defined as a string where its first character is a hashtag """
self.assertEqual(is_ref(""), False)
self.assertEqual(is_ref("#footer"), True)
self.assertEqual(is_ref("www.google.com"), False)
def test_remove_ending_forward_slash(self):
""" Remove a slash if it is the last character in a string """
actual = remove_ending_forward_slash("www.google.com/")
expected = "www.google.com"
self.assertEqual(actual, expected)
def test_get_tags(self):
""" map a list of dictionaries to only
have name, displayName feilds """
expected = [{'name': 'test_tag', 'displayName': 'test_display'}]
tags = {'tags': [
{
'name': 'test_tag',
'displayName': 'test_display',
'files': []
},
]}
self.assertEqual(get_tags(tags), expected)
def test_download_files(self):
""" Download file to tmp directory if url is valid """
expected = True
dirpath = tempfile.mkdtemp()
actual = download_files(
"https://raw.githubusercontent.com/tektoncd/pipeline/master",
dirpath,
[{"README.md": "README.md"}]
)
shutil.rmtree(dirpath)
self.assertEqual(actual, expected)
dirpath = tempfile.mkdtemp()
self.assertRaises(
Exception,
download_files,
"http://fake.c0m",
dirpath,
[{"test": "test"}]
)
shutil.rmtree(dirpath)
def test_yaml_files_to_dic_list(self):
""" convert a list of files into a list of dictionaries """
# create a tmp file with yaml txt
text = "{displayOrder: 1}"
actual = None
tmp_name = None
with tempfile.NamedTemporaryFile(delete=False) as tmp:
tmp_name = tmp.name
tmp.write(text.strip().encode())
expected = [{'displayOrder': 1}]
actual = yaml_files_to_dic_list([tmp_name])
self.read_and_delete_file(tmp_name)
self.assertEqual(actual, expected)
def test_get_files(self):
""" create a list of files within a
directory that contain a valid extension"""
expected = None
actual = None
with tempfile.NamedTemporaryFile(delete=True) as tmp:
expected = [tmp.name]
actual = get_files("/tmp", self.path_leaf(tmp.name))
self.assertEqual(actual, expected)
def test_get_file_dirs(self):
expected = (
'https://github.com/tektoncd/cli/raw/master/docs',
"/tmp",
"/tmp",
[{"README.md": "_index.md"}]
)
entry = {
"component": "CLI",
"displayOrder": 2,
"repository": "https://github.com/tektoncd/cli",
"docDirectory": "docs",
"tags": [
{
"name": "master",
"displayName": "master",
"files": [
{
"README.md": "_index.md"
}
]
}
],
"archive": "https://github.com/tektoncd/cli/tags"
}
actual = get_file_dirs(entry, 0, "/tmp", "/tmp")
self.assertEqual(actual, expected)
def test_get_links(self):
""" return a list of links formated in markdown in a given string"""
actual = "www.link.com"
expected = get_links("")
self.assertEqual([], expected)
expected = get_links("[link](www.link.com) this is a link")
self.assertEqual(actual, expected[0].get("href"))
def test_is_url(self):
"""This will return a test to see if the link is a valid url format"""
expected = is_url("http://www.fake.g00gl3.com")
self.assertEqual(True, expected)
expected = is_url("http://www.google.com")
self.assertEqual(True, expected)
expected = is_url("http://www.github.com")
self.assertEqual(True, expected)
expected = is_url("./sync.py")
self.assertEqual(False, expected)
expected = is_url("www.github.com")
self.assertEqual(False, expected)
def test_transform_text(self):
"""Ensure that transform links will turns links to
relative github link or existing file name"""
expected = """
[invalid-relative-link](test.com/./adw/a/d/awdrelative)
[valid-relative-link](./sync.py)
[valid-absolute-link](www.github.com)
[invalid-absolute-link](https://website-invalid-random321.net)
[valid-ref-link](#footer)
"""
text = """
[invalid-relative-link](./adw/a/d/awdrelative)
[valid-relative-link](./sync.py)
[valid-absolute-link](www.github.com)
[invalid-absolute-link](https://website-invalid-random321.net)
[valid-ref-link](#footer)
"""
actual = None
tmp_name = None
# write to file
with tempfile.NamedTemporaryFile(delete=False) as tmp:
tmp_name = tmp.name
name = self.path_leaf(tmp_name)
tmp.write(text.strip().encode())
# mutate file
transform_text("", "/tmp", [{name: name}], "test.com")
# read and delete file
actual = self.read_and_delete_file(tmp_name)
self.assertEqual(actual.strip(), expected.strip())
if __name__ == '__main__':
unittest.main()
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册