Skip to content
Snippets Groups Projects
Verified Commit 080741be authored by Karel van Klink's avatar Karel van Klink :smiley_cat:
Browse files

Include documentation from goat/gap/geant-automation-platform

parent c2ce98ae
No related branches found
No related tags found
1 merge request!316Replace Sphinx with MkDocs
Showing
with 548 additions and 45 deletions
...@@ -12,12 +12,12 @@ oss-params.json ...@@ -12,12 +12,12 @@ oss-params.json
build/ build/
# Documentation # Documentation
docs/build site
docs/vale/styles/* vale/styles/*
!docs/vale/styles/config/ !vale/styles/config/
!docs/vale/styles/custom/ !vale/styles/custom/
.DS_Store .DS_Store
.idea .idea
.venv .venv
.env .env
\ No newline at end of file
...@@ -4,8 +4,6 @@ stages: ...@@ -4,8 +4,6 @@ stages:
- documentation - documentation
- sonarqube - sonarqube
- trigger_jenkins_build - trigger_jenkins_build
include:
- docs/.gitlab-ci.yml
#################################### tox - Testing and linting #################################### tox - Testing and linting
run-tox-pipeline: run-tox-pipeline:
...@@ -45,6 +43,43 @@ run-tox-pipeline: ...@@ -45,6 +43,43 @@ run-tox-pipeline:
paths: paths:
- htmlcov - htmlcov
##### Sphinx - Generate documentation
build-documentation:
stage: documentation
tags:
- docker-executor
image: sphinxdoc/sphinx:latest
before_script:
- pip install sphinx_rtd_theme sphinxcontrib-jquery
- cd $CI_PROJECT_DIR/docs/source
script:
- make html
artifacts:
paths:
- $CI_PROJECT_DIR/docs/build/html
##### Vale - Documentation linter
lint-documentation:
stage: documentation
image:
name: jdkato/vale:latest
entrypoint: [""]
tags:
- docker-executor
needs:
- job: build-documentation # Only run when documentation has been built
artifacts: true
before_script:
- cd $CI_PROJECT_DIR/docs/vale
- vale sync
script:
- vale --glob='!*/migrations/*' $CI_PROJECT_DIR/docs/source $CI_PROJECT_DIR/gso
sonarqube: sonarqube:
stage: sonarqube stage: sonarqube
image: sonarsource/sonar-scanner-cli:10.0 image: sonarsource/sonar-scanner-cli:10.0
......
...@@ -2,7 +2,7 @@ ...@@ -2,7 +2,7 @@
set -o errexit set -o errexit
set -o nounset set -o nounset
export OSS_PARAMS_FILENAME=../gso/oss-params-example.json export OSS_PARAMS_FILENAME=gso/oss-params-example.json
export TESTING=true export TESTING=true
pip install mkdocstrings-python mkdocs_gen_files mkdocs-material mkdocs-literate-nav mkdocs-section-index pip install mkdocstrings-python mkdocs_gen_files mkdocs-material mkdocs-literate-nav mkdocs-section-index
......
---
##### Sphinx - Generate documentation
build-documentation:
stage: documentation
tags:
- docker-executor
image: sphinxdoc/sphinx:latest
before_script:
- pip install sphinx_rtd_theme sphinxcontrib-jquery
- cd $CI_PROJECT_DIR/docs/source
script:
- make html
artifacts:
paths:
- $CI_PROJECT_DIR/docs/build/html
##### Vale - Documentation linter
lint-documentation:
stage: documentation
image:
name: jdkato/vale:latest
entrypoint: [""]
tags:
- docker-executor
needs:
- job: build-documentation # Only run when documentation has been built
artifacts: true
before_script:
- cd $CI_PROJECT_DIR/docs/vale
- vale sync
script:
- vale --glob='!*/migrations/*' $CI_PROJECT_DIR/docs/source $CI_PROJECT_DIR/gso
# Ansible
## Design
## Ansible roles and playbooks
## Troubleshooting
\ No newline at end of file
# About this section
All the information regarding modeling and Ansible mechanics are described in this section.
The structure is:
- WFO
- Modelling
- Ansible
- Design
- Low level description
# Integration with Infoblox
The Infoblox service in GAP takes care of IP resources when creating routers,
IP trunks, and other customer-facing services.
# Integration with Kentik
Routers are added to Kentik when they have a PE role. This can be either when
creating a new PE router, or upgrading an existing P router to PE.
When the router is added to Kentik, a placeholder license is applied. Then,
a member of the service management team is able to run the
`modify_router_kentik_license` workflow to apply a different license.
# Integration with LibreNMS
When a router is created or terminated, it is added or removed from LibreNMS.
# Integration with Netbox
Netbox is used for bookkeeping of interface usage, and suggesting options to
the operator for available LAGs.
# IP trunks
IP trunks are core links between two GÉANT routers.
The IP trunk is a special service since on the interfaces at the end of the trunk, no VLAN multiplexing is allowed.
For this reason, in case of an IP trunk, we do not use the canonical decomposition that leverages a demarcation point.
## Modelling and attributes
The relevant attributes for an IPTrunk are the following:
| Attribute name | Attribute type | Description |
|---------------------------------------|----------------|------------------------------------------------------------------------------------------------------|
| `geant_s_sid` | `str` | GÉANT service ID associated with this trunk. |
| `iptrunk_description` | `str` | A human-readable description of this trunk. |
| `iptrunk_type` | `IptrunkType` | The type of trunk, can be either dark fibre or leased capacity. |
| `iptrunk_speed` | `str` | should be of PhyPortCapacity type The speed of the trunk, measured per interface associated with it. |
| `iptrunk_minimum_links` | `int` | The minimum amount of links the trunk should consist of. |
| `iptrunk_isis_metric` | `int` | The IS-IS metric of this link |
| `iptrunk_ipv4_network` | `IPv4Network` | The IPv4 network used for this trunk. |
| `iptrunk_ipv6_network` | `IPv6Network` | The IPv6 network used for this trunk. |
| `iptrunk_side_node` | `DeviceBlock` | The router that hosts the A side of the trunk. |
| `iptrunk_side_ae_iface` | `str` | The name of the interface on which the trunk connects. |
| `iptrunk_side_ae_geant_a_sid` | `str` | The service ID of the interface. |
| `iptrunk_side_ae_members` | `list[str]` | A list of interface members that make up the aggregated Ethernet interface. |
| `iptrunk_side_ae_members_description` | `list[str]` | The list of descriptions that describe the list of interface members. |
# Diagram
``` mermaid
classDiagram
Site <|-- Router :belong
class Site{
+UUId name
+String phoneNumber
+String emailAddress
}
class Router{
+int studentNumber
+int averageMark
+isEligibleToEnrol()
+getSeminarsTaken()
}
```
## Node deployment
A node consists of one or more routers, a switch, and a terminal server.
In general -- as laid out more extensively
<a href="https://wiki.geant.org/display/NETENG/001+-+Topology+and+physical+layout" target="_blank">here</a>
(behind login) -- a PoP consists of:
* One or two routers
* One switch
* One terminal server
Globally, the workflow for a new site is as follows:
1. Deploy terminal server:
1. Generate base configuration from GitLab
2. Ship the device to its location
3. Verify reachability and insert in LibreNMS
2. Deploy PoP router in a 'core' fashion
1. Rack it up and configure the hardware
2. Connect the router to the terminal server via both a console connection, and FXP
3. Deploy base configuration using GAP
3. Deploy PoP switch
1. Rack it up and configure the hardware
2. Connect the switch to the terminal server via both a console connection, and FXP
3. Deploy base configuration using GAP
4. Deploy the PoP interconnect between router and switch
1. Set up a physical connection between router and switch
2. Deploy configuration using GAP
5. Deploy IP trunks to connect the router to the rest of the network
1. Set up a physical connection
2. Deploy configuration using GAP
6. Update the iBGP mesh to include the new router, promoting it to an edge router
1. Deploy configuration using GAP
2. Using GAP, insert the devices in LibreNMS
In the context of the automation platform, the PoP interconnects mentioned are modeled as separate objects.
# Routers
Routers are the packet-layer devices that form the backbone of the GÉANT
network. They only require an active site subscription to be available,
which is the location one is hosted at. Virtually all services depend on an
active router subscription. As a result, this is one of the most fundamental
subscription instances in GSO.
From a bird's-eye view, the process of deploying a new router in the network is as follows:
1. Manually configure the router such that it is reachable from out-of-band (OOB).
2. Upgrade the router to the most recent OS.
3. Deploy base configuration.
4. Configure trunks to connect the router to the network.
5. Update the protocol meshes (such as iBGP).
6. Promote the router to the production environment.
![Provisioning a router in WFO](../../assets/images/WFO_deploy_router.png)
*WFO provisions a new router by following the steps shown here.*
## Modelling and attributes
The attributes of a router are as follows:
| Attribute name | Attribute type | Description |
|--------------------------|----------------|---------------------------------------------------------------------------------------------------------|
| `router_fqdn` | `str` | The FQDN of a router |
| `router_ts_port` | `PortNumber` | The port of the terminal server that this router is connected to.<br/>Used to offer out of band access. |
| `router_access_via_ts` | `bool` | Whether this router should be accessed through the terminal server, or through its loopback address. |
| `router_lo_ipv4_address` | `IPv4Address` | The IPv4 loopback address of a router. |
| `router_lo_ipv6_address` | `IPv6Address` | The IPv6 loopback address of a router. |
| `router_lo_iso_address` | `str` | The ISO address of the router, used for ISIS support. |
| `router_role` | `RouterRole` | The role of the router, which can be P, PE, or AMT. |
| `router_site` | `SiteBlock` | The site that this router is located at. |
| `vendor` | `RouterVendor` | The vendor of a router, either Juniper or Nokia. |
## Deployment
For the deployment of a router, two workflows are required to be run. The
first is creation of the router subscription itself, and preparing it for
insertion into the network. After completing this workflow, it needs to get
added to the iBGP mesh of existing routers in the network.
!!! tip
The creation of a new router also requires an active site subscription,
ensure that this is already in place before continuing.
# Sites
Sites are an abstract construction to model information that is shared across
all services deployed at one physical location. They only contain information
relevant to the services that rely on it. As a result, external BSS and OSS
systems are still in place for other accounting and bookkeeping purposes.
However, these lie outside the scope of GAP.
## Modelling and attributes
A Site object contains the following attributes:
| Attribute name | Attribute type | Description |
|--------------------------------------------------|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `site_name` | `str` | The name of the site, that will dictate part of the FQDN of routers that are hosted at this site.<br/>For example: `router.X.Y.geant.net`, where X denotes the name of the site. |
| `site_city` | `str` | The city at which the site is located. |
| `site_country` | `str` | The country in which the site is located. |
| `site_country_code` | `str` | The code of the corresponding country. This is also used for the FQDN,<br/>following the example for the site name, the country code goes in the Y position. |
| `site_latitude` | `LatitudeCoordinate` | The latitude of the site, used for SNMP. |
| `site_longitude` | `LongitudeCoordinate`&nbsp;&nbsp;&nbsp;&nbsp; | The longitude of the site, again for SNMP. |
| `site_internal_id` | `int` | The internal ID used within GÉANT for a site. |
| `site_bgp_community_id` &nbsp;&nbsp;&nbsp;&nbsp; | `int` | The BGP community ID of a site, used to advertise routes learned at this site. |
| `site_tier` | `SiteTier` | The tier of a site, which corresponds to installed equipment. |
| `site_ts_address` | `IPv4Address` | The address of the terminal server hosted at this site.<br/>It is used for out of band access to any equipment hosted here. |
# Ansible
Ansible is responsible for:
- Compiling, deploying, and deleting configuration on targeted devices
- Gathering operational information from the targeted devices
The Ansible subsystem is composed of three main functional parts:
- A plugin that is responsible for exposing the Ansible engine to the Workflow Orchestrator via APIs
- A set of Ansible roles and playbooks that interacts with network elements
- A set of global variables stored in a Git repository that build the Ansible Inventory
# Components of GAP
As stated before, GAP is a platform and not a monolithic piece of software. GAP interacts with different OSS/BSS
systems already present in GÉANT and these are tightly integrated with the automation platform.
From a high level point of view, GAP can be seen as the sum of the following parts:
- __A service database__ called CoreDB that stores the models of the service instances, called _subscriptions_.
Subscriptions are abstract objects that represent functional configuration constructs: the attributes that characterize
these objects are defined in the _domain models_.
- __An orchestration engine__ called Workflow Orchestrator that is capable of executing lists of steps called workflows.
- __A web interface__ for operators, called Orchestrator GUI to intuitively launch and inspect workflows.
- __An automation engine__, Ansible, capable of interacting with network devices to configure them or to gather
operational information.
- __A set of authoritative systems to manage resources__:
- IP addresses and DNS names (Infoblox)
- Physical interfaces (Netbox)
To interact with these external systems, specific plugins or wrappers are in place.
An overview of how these components interact is depicted in the following diagram:
![](../../assets/images/TNC23_diagrams-WFO_GAP.drawio.png){width=300}
# Lightweight service orchestrator
This page describes the inner workings of the Lightweight Service Orchestrator
(LSO), that handles the interaction between GSO and Ansible.
## Motivation
For the deployment of new services in the GÉANT network, Ansible playbooks are
used to deploy configuration statements onto remote devices. To make this
interaction possible, LSO exposes an API that allows for the remote execution
of playbooks.
The need to externalise this interaction comes from the fact that the Python
library used to execute playbooks, introduces a potential situation where
dependency versions could be conflicting. To prevent this from happening, GSO
and LSO each are their own Python package, with each their own, independent
library dependencies.
## Inner workings
LSO uses <a href="https://ansible.readthedocs.io/projects/runner/en/latest/"
target="_blank">`ansible-runner`</a> for the execution of Ansible playbooks.
This package fully dictates the way in which GAP interacts with Ansible itself.
LSO only introduces an API with a single REST endpoint that exposes its
functionality.
In the case of GAP, all Ansible playbooks operate without an inventory that
contains all relevant `group_vars` and `host_vars`. The inventory is passed to
the API endpoint for executing a playbook, which contains all required
`host_vars`. For the other information relevant to the playbook, this is passed
through the API by making use of `extra_vars`. In virtually all cases, the
`extra_vars` will at least consist of the subscription object that is being
deployed, and assisting variables, such as 'verb' used to express an operation.
As an example, the following object is passed to the Ansible playbook for the
deployment of a new router in the network.
``` json
extra_vars = {
"subscription": {
"product": {
"product_id": "27c9dc35-f0fa-4901-bda4-65df5bb7499d",
"name": "Router",
"description": "A Router",
"product_type": "Router",
"tag": "RTR",
"status": "active",
"created_at": "2024-01-24T15:47:13+00:00",
"end_date": None,
},
"customer_id": "8f0df561-ce9d-4d9c-89a8-7953d3ffc961",
"subscription_id": "b57cbbc8-e8d1-47f8-add6-7923ecd7e3d5",
"description": "Router SrzptDtKBIFGijnHrglQ.flores.bb.geant.net",
"status": "provisioning",
"insync": False,
"start_date": None,
"end_date": None,
"note": None,
"router": {
"name": "RouterBlock",
"subscription_instance_id": "09d6bea9-8c79-4e75-9a69-ef249bb9de5e",
"owner_subscription_id": "b57cbbc8-e8d1-47f8-add6-7923ecd7e3d5",
"label": None,
"router_fqdn": "SrzptDtKBIFGijnHrglQ.flores.bb.geant.net",
"router_ts_port": 4223,
"router_access_via_ts": True,
"router_lo_ipv4_address": "74.95.57.63",
"router_lo_ipv6_address": "ac6f:7008:40d3:d431:bcc4:2eac:b443:f6b8",
"router_lo_iso_address": "49.51e5.0001.0740.9505.7063.00",
"router_role": "amt",
"router_site": {
"name": "SiteBlock",
"subscription_instance_id": "874ffb0b-cf55-49ea-810f-7268c02891fa",
"owner_subscription_id": "324239ea-555b-464d-bfde-54666470d71d",
"label": None,
"site_name": "flores",
"site_city": "Whitemouth",
"site_country": "Zimbabwe",
"site_country_code": "BB",
"site_latitude": "45.39258",
"site_longitude": "137.727838",
"site_internal_id": 9881,
"site_bgp_community_id": 8738,
"site_tier": "1",
"site_ts_address": "137.105.143.190",
},
"vendor": "nokia",
},
},
"dry_run": True,
"verb": "deploy",
"commit_comment": "GSO_PROCESS_ID: 549aae60-0574-4c5a-a736-00c83fdb446a -
TT_NUMBER: TT#1987043028032905 - Deploy base config"
}
```
In this example, four top-level keys are included: `subscription`, `dry_run`,
`verb`, and `commit_comment`. In order, these are used for the following.
The `subscription` key includes a dictionary representation of the subscription
that is being provisioned. In the case of a router, `router` contains
information about the subscription object, with its child key `router_site` that
contains information about the site at which this router is deployed.
Information about this router site comes from the related site subscription
which is already 'deployed' in GSO.
For the distinction between practice runs, and actual deployments, the variable
`dry_run` is included. The difference between an execution which is a dry run
and one that is not, is the commitment of configuration. With a dry run,
configuration is only checked, and not committed to the remote machine. When
`dry_run` is set to `False`, the configuration is checked and then committed.
To distinguish between different actions that can be taken with service
deployments, 'verbs' are introduced. In the example, the verb is set to 'deploy'
to provision a new service. Other examples of verbs can include 'deactivate',
'modify', or 'terminate'.
The `commit_comment` is used for bookkeeping purposes on the remote machines.
This can be used for debugging or accounting purposes, among others. It always
includes the process ID of the workflow that is related to an operation, and the
associated trouble ticket number.
### The full API request
From the previous section, `extra_vars` is only one piece of the puzzle. For a
full-fledged API request to LSO, an example call is given.
``` json
{
"playbook_name": "deploy_a_service.yaml",
"callback": "https://orchestrator.gap.geant.org/api/processes/(…)/callback/(…)",
"inventory": {
"all": {
"hosts": {
"edge1-host": {
"example-var": "A value",
"another-var": "Totally optional, and can differ per host"
},
"edge2-host": null // Note that the `null` is a mandatory YAML-restriction
}
},
"extra_vars": {
…as shown above
}
}
}
```
## Code documentation
Code documentation for LSO can be found
<a href="https://workfloworchestrator.org/lso" target="_blank">here</a>.
## Deployment within GÉANT
For the deployment in GÉANT, LSO runs inside a Docker container. The Dockerfile
used to build this container is available <a href=
"https://gitlab.software.geant.org/goat/gap/lso/-/blob/develop/Dockerfile"
target="_blank">here</a>.
When building the Docker image, some Ansible roles and collections are installed
that are required for interacting with Juniper and Nokia equipment. For another
organisation that would want to use LSO in their deployment, it is highly
recommended to use this Dockerfile as a starting point. From this another Docker
image can be built with custom Ansible requirements pre-installed.
It also opens up the possibility to include an Ansible inventory, if so desired.
Do note however, that this introduces a requirement to re-build LSO every time
the inventory is updated, or to have it included as a volume mount inside the
running container. Including a dynamic inventory with every API call is
therefore the recommended way to go.
# Netbox
Netbox is a DCIM capable of managing inventory of Sites, Racks, Devices, Ports, etc.
GAP makes use of Netbox to store information about which physical interfaces are in use.
There are some strong assumptions that are made about GAP:
- All the nodes have fixed configuration that does not change often over time
- The only point where operators can assign interfaces is GAP
- The legacy Juniper network is not managed by GAP
## Network hardware
The new routers have a static hardware configuration which depends on the tier of the site at which it is installed.
The following table summarizes the possible configurations:
| Tier | Chassis | Control plane | Switching fabric | Linecard |
|------|---------|---------------|------------------|----------------------------------|
| 1 | SR7s | 2x CPM2-s | 4x SFM7-s | 2x XCM2 - 2x XMA2-s (36p QSFPDD) |
| 2 | SR7s | 2x CPM2-s | 4x SFM7-s | 2x XCM2 - 2x XMA2-s (36p QSFPDD) |
| 3 | TBD | TBD | TBD | TBD |
| 4 | TBD | TBD | TBD | TBD |
# Workflow Orchestrator
In this page, a description is given of the mechanisms that govern:
- Workflow Orchestrator
- CoreDB
- The framework for the integration of plugins
(TBA)
# Configuration decomposition
## Functional decomposition
Every configuration statement expresses a particular function, this taxonomy decomposes the various types of
configuration:
![](../../assets/images/TNC23_diagrams-ConfigSlicing.drawio.png){ width='300' }
### Base configuration
It is present on all nodes, it performs exactly the same functionality, and is adapted to particular values or
characteristics of the node itself.
Examples are:
- SNMP communities: are the same for all the routers, including the client list
- AAA (local user and radius authentication): the same on all the routers, but the source address of Radius requests is
the loopback of the specific device
Once the router has been configured with the base configuration, it is ready for the installation of one or more IP
trunks. Doing so will connect it to the rest of the network.
At this point, the router's loopback is reachable via IGP and the new device can be inserted in the network configuring
iBGP mesh and adding it to tooling.
✨ Now the router is in service, ready to deliver services. ✨
### Service prerequisites
L3VPNs must exist on the router before customer facing services can be configured. There might also be specific
communities or routing policies that are generic and are considered as prerequisite for customer connections.
This functional block is responsible for this.
### Service
This block represents a service on the edge: a L2 circuit or a BGP peer. This block is itself decomposed in other parts,
according to the functional parts needed by configuration. In general, a meta-service called a demarcation point is
used to identify the physical interface on which the service should be delivered:
![](../../assets/images/TNC23_diagrams-Service_stitching.drawio.png){ width='400' }
A service is always delivered on a VLAN (that can be native VLAN in some particular cases, for example an IP trunk).
The VLAN is called a _Service Delivery Point_.
## Separation of data and configuration
Configuration can be seen as the composition of some abstract data structure that represent the business logic, and a
template that match the router's specific configuration dialect. In the following example DNS servers on a Juniper and
a Nokia router are configured:
DNS servers (abstract data):
```yaml
system_name_servers:
- 192.168.1.1
- 192.168.1.254
```
Junos template:
```jinja
system {
replace: name-server {
{%- for name_server in system_name_servers %}
{{ name_server }};
{%- endfor %}
}
}
```
Nokia template:
```
To be added...
```
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment