Skip to content
Snippets Groups Projects
Verified Commit 889dd3fb authored by Karel van Klink's avatar Karel van Klink :smiley_cat:
Browse files

Move documentation for workflows to separate files

This allows us to link to them from the orchestrator GUI
parent 9ada9933
No related branches found
No related tags found
2 merge requests!23Move documentation for workflows to separate files,!22Move documentation for workflows to separate files
Pipeline #89699 failed
Showing
with 72 additions and 331 deletions
# About this section # About this section
All the information regarding modeling, workflows and Ansible mechanics are described in this section. All the information regarding modeling and Ansible mechanics are described in this section.
Next to this, also troubleshooting and maintenance are included.
The structure is: The structure is:
- WFO - WFO
- Modelling - Modelling
- Workflow
- Maintenance
- Troubleshoot
- Ansible - Ansible
- Design - Design
- Low level description - Low level description
- Troubleshoot
# Integration with Infoblox # Integration with Infoblox
TBA The Infoblox service in GAP takes care of IP resources when creating routers,
IP trunks, and other customer-facing services.
# Integration with Kentik # Integration with Kentik
TBA Routers are added to Kentik when they have a PE role. This can be either when
creating a new PE router, or upgrading an existing P router to PE.
When the router is added to Kentik, a placeholder license is applied. Then,
a member of the service management team is able to run the
`modify_router_kentik_license` workflow to apply a different license.
# Integration with LibreNMS # Integration with LibreNMS
TBA When a router is created or terminated, it is added or removed from LibreNMS.
# Integration with Netbox # Integration with Netbox
TBA Netbox is used for bookkeeping of interface usage, and suggesting options to
the operator for available LAGs.
...@@ -23,60 +23,3 @@ The relevant attributes for an IPTrunk are the following: ...@@ -23,60 +23,3 @@ The relevant attributes for an IPTrunk are the following:
| `iptrunk_side_ae_geant_a_sid` | `str` | The service ID of the interface. | | `iptrunk_side_ae_geant_a_sid` | `str` | The service ID of the interface. |
| `iptrunk_side_ae_members` | `list[str]` | A list of interface members that make up the aggregated Ethernet interface. | | `iptrunk_side_ae_members` | `list[str]` | A list of interface members that make up the aggregated Ethernet interface. |
| `iptrunk_side_ae_members_description` | `list[str]` | The list of descriptions that describe the list of interface members. | | `iptrunk_side_ae_members_description` | `list[str]` | The list of descriptions that describe the list of interface members. |
## Workflows
### Deployment
This the workflow that brings the subscription from INACTIVE to PROVISIONING and finally to ACTIVE.
The deployment of a new IPtrunk consist in the following steps:
- Fill the form with the necessary fields:
- SID
- Type
- Speed
- Nodes
- LAG interfaces with description
- LAG members with description
- WFO will query IPAM to retrieve the IPv4/IPv6 Networks necessary for the trunk. The container to use is specified in
`oss-params.json`
- The configuration necessary to deploy the LAG is generated and applied to the destination nodes using the Ansible
playbook `iptrunks.yaml` This is done first in a dry mode (without committing) and then in a real mode committing the
configuration. The commit message has the `subscription_id` and the `process_id`. Included in this, is the configuration
necessary to enable LLDP on the physical interfaces.
- Once the LAG interface is deployed, another Ansible playbook is called to verify that IP traffic can actually flow
over the trunk ( `iptrunk_checks.yaml`)
- Once the check is passed, the ISIS configuration will take place using the same `iptrunks.yaml`. Also in this case
first there is a dry run and then a commit.
- After this step the ISIS adjacency gets checked using again `iptrunks_checks.yaml`
The trunk is deployed with an initial ISIS metric of 9000 to prevent traffic to pass.
### Termination
This workflow deletes all the configuration related with an IPtrunk from the network and brings the subscription from
`ACTIVE` to `TERMINATED`. The steps are the following:
- Modify the ISIS metric of the trunks so to evacuate traffic - and wait confirmation from an operator.
- Delete all the configuration (first dry then actual deletion):
- LAG and members of the LAG
- reference in LLDP protocol (if juniper)
- reference in ISIS protocol
- Delete the IPv4/IPv6 networks from IPAM
### Modification
To modify IP Trunks, have 2 different workflows exist:
- Modify ISIS metric - modifies protocols/ISIS/interface
- Modify Trunk interface - modifies lag interfaces and members. This is used to increase capacity or to change
SID/interface descriptions.
In both cases, the strategy is to re-apply the necessary template to the configuration construct: using a "replace"
strategy only the necessary modifications will be applied.
At the time of writing, the deletion of members from an existing IPtrunk is not supported.
### Migration
TBA
...@@ -16,4 +16,41 @@ classDiagram ...@@ -16,4 +16,41 @@ classDiagram
+isEligibleToEnrol() +isEligibleToEnrol()
+getSeminarsTaken() +getSeminarsTaken()
} }
``` ```
\ No newline at end of file
## Node deployment
A node consists of one or more routers, a switch, and a terminal server.
In general -- as laid out more extensively
<a href="https://wiki.geant.org/display/NETENG/001+-+Topology+and+physical+layout" target="_blank">here</a>
(behind login) -- a PoP consists of:
* One or two routers
* One switch
* One terminal server
Globally, the workflow for a new site is as follows:
1. Deploy terminal server:
1. Generate base configuration from GitLab
2. Ship the device to its location
3. Verify reachability and insert in LibreNMS
2. Deploy PoP router in a 'core' fashion
1. Rack it up and configure the hardware
2. Connect the router to the terminal server via both a console connection, and FXP
3. Deploy base configuration using GAP
3. Deploy PoP switch
1. Rack it up and configure the hardware
2. Connect the switch to the terminal server via both a console connection, and FXP
3. Deploy base configuration using GAP
4. Deploy the PoP interconnect between router and switch
1. Set up a physical connection between router and switch
2. Deploy configuration using GAP
5. Deploy IP trunks to connect the router to the rest of the network
1. Set up a physical connection
2. Deploy configuration using GAP
6. Update the iBGP mesh to include the new router, promoting it to an edge router
1. Deploy configuration using GAP
2. Using GAP, insert the devices in LibreNMS
In the context of the automation platform, the PoP interconnects mentioned are modeled as separate objects.
...@@ -6,6 +6,19 @@ which is the location one is hosted at. Virtually all services depend on an ...@@ -6,6 +6,19 @@ which is the location one is hosted at. Virtually all services depend on an
active router subscription. As a result, this is one of the most fundamental active router subscription. As a result, this is one of the most fundamental
subscription instances in GSO. subscription instances in GSO.
From a bird's-eye view, the process of deploying a new router in the network is as follows:
1. Manually configure the router such that it is reachable from out-of-band (OOB).
2. Upgrade the router to the most recent OS.
3. Deploy base configuration.
4. Configure trunks to connect the router to the network.
5. Update the protocol meshes (such as iBGP).
6. Promote the router to the production environment.
![Provisioning a router in WFO](../../assets/images/WFO_deploy_router.png)
*WFO provisions a new router by following the steps shown here.*
## Modelling and attributes ## Modelling and attributes
The attributes of a router are as follows: The attributes of a router are as follows:
...@@ -22,12 +35,7 @@ The attributes of a router are as follows: ...@@ -22,12 +35,7 @@ The attributes of a router are as follows:
| `router_site` | `SiteBlock` | The site that this router is located at. | | `router_site` | `SiteBlock` | The site that this router is located at. |
| `vendor` | `RouterVendor` | The vendor of a router, either Juniper or Nokia. | | `vendor` | `RouterVendor` | The vendor of a router, either Juniper or Nokia. |
## Workflows ## Deployment
A router supports different workflows to take it through the subscription
lifecycle.
### Deployment
For the deployment of a router, two workflows are required to be run. The For the deployment of a router, two workflows are required to be run. The
first is creation of the router subscription itself, and preparing it for first is creation of the router subscription itself, and preparing it for
...@@ -37,105 +45,3 @@ added to the iBGP mesh of existing routers in the network. ...@@ -37,105 +45,3 @@ added to the iBGP mesh of existing routers in the network.
!!! tip !!! tip
The creation of a new router also requires an active site subscription, The creation of a new router also requires an active site subscription,
ensure that this is already in place before continuing. ensure that this is already in place before continuing.
### Creation
To add a new router to the GÉANT network, the `create_router` workflow must
be executed first. The intake form for this workflow requires the following
fields to be filled in:
* Trouble ticket number
* Router vendor
* Router site
* Hostname
* Terminal server port
* Router role
The hostname is validated, by checking that the resulting FQDN is not
already taken in IPAM.
!!! warning
The validation only checks whether the FQDN is already taken in IPAM,
**not** whether it is registered somewhere on the internet.
When the workflow is started, a subscription object is first instantiated in
the service database, containing all the information that was provided in
the input form at the beginning. Then, the loopback addresses are allocated
in IPAM, which results in both the IPv4 and IPv6 addresses in the product model.
Once allocated, the first dry run of deploying router configuration takes place.
An Ansible playbook is run, with all the attributes of the new router. This
is where GSO communicates with LSO, and the router configuration is checked,
but not committed to the machine.
After the dry run, the operator is presented with a view of the outcome of
this playbook. This is their opportunity to verify successful execution of
the Ansible playbook, and whether the difference in configuration is as
expected. If not, this is their chance to abort the workflow, and no harm is
done to the router.
When the operator confirms the outcome of this playbook execution, the
playbook runs once again, but it will also commit the configuration after
checking. With the new router configured, the IPAM resources are verified to
ensure this external system is configured correctly.
If the new router is a Nokia, all its interfaces are added to Netbox. This
is done to keep track of interface reservations and bookkeeping. For Juniper
routers, this does not need to take place. These existing devices are not
migrated into Netbox.
Finally, an Ansible playbook is run to verify that the connectivity and
optical power levels of the router are in order. Once this is completed, the
router is moved into an `ACTIVE` state.
### Update iBGP mesh
Once a new router is added to the network, it must become reachable by all
other devices. To achieve this, the `update_ibgp_mesh` workflow must be
executed. This workflow will add the new P router to all PE routers in the
network, and add all existing PE routers to the new P router. The only input
this workflow takes, is a trouble ticket number. All other required
information is already in the service database.
The workflow will run 5 Ansible playbooks:
1. Check: add P router to all PE routers
2. Deploy: add P router to all PE routers
3. Check: add all PE routers to P router
4. Deploy: add all PE routers to P router
5. Verify: check that the iBGP has come up
Once these playbooks have been run successfully, the new P router is added
to LibreNMS. Finally, the subscription model of the router is updated such that
`router_access_via_ts` is set to `False`. This is because the router is now
reachable by other machines by its loopback address. Using out of band access is
therefore not needed anymore.
### Redeploy
When a new router is deployed, it is loaded with the current version of
configuration that contain the bare necessities. For various reasons, this
template may change, and the resulting configuration follows from this. To
update a router 'in the wild' where this change should be reflected, the
workflow `redeploy_base_config` is used.
This workflow only takes a trouble ticket number as initial input, and
deploys the base configuration, first as a dry run. After confirmation by an
operator, the configuration is committed to the machine, and this completes
the workflow.
### Termination
To terminate a router, the workflow `terminate_router` is used. The operator
is presented with an input form that requires once again a trouble ticket
number. On top of this, there is also the option whether this workflow should
remove all configuration on the router, and whether IPAM entries related to
this device should be removed.
The workflow consists of the following steps:
1. Deprovision IPAM resources (if selected).
2. Try to remove configuration form the router (if selected).
3. Commit removal of configuration (if selected).
4. For Nokia devices: remove interfaces from Netbox.
5. Set the subscription status to `TERMINATED`.
...@@ -22,35 +22,3 @@ A Site object contains the following attributes: ...@@ -22,35 +22,3 @@ A Site object contains the following attributes:
| `site_bgp_community_id` &nbsp;&nbsp;&nbsp;&nbsp; | `int` | The BGP community ID of a site, used to advertise routes learned at this site. | | `site_bgp_community_id` &nbsp;&nbsp;&nbsp;&nbsp; | `int` | The BGP community ID of a site, used to advertise routes learned at this site. |
| `site_tier` | `SiteTier` | The tier of a site, which corresponds to installed equipment. | | `site_tier` | `SiteTier` | The tier of a site, which corresponds to installed equipment. |
| `site_ts_address` | `IPv4Address` | The address of the terminal server hosted at this site.<br/>It is used for out of band access to any equipment hosted here. | | `site_ts_address` | `IPv4Address` | The address of the terminal server hosted at this site.<br/>It is used for out of band access to any equipment hosted here. |
## Workflows
The Site subscription has three basic workflows: creation, modification, and
termination.
### Creation
The `create_site` workflow creates a new site object in the service database,
and sets the subscription lifecycle to `ACTIVE`. The attributes that are input
using the intake form of the workflow are stored, and nothing else happens.
### Modification
Attributes of an existing site can be modified using the `modify_site` workflow.
As a result, other subscriptions that rely on this site will have referenced
attributes updated as well.
!!! warning
Be aware that although this *does* update attributes in the services
database, it does **not** update any active subscription instances that are
already deployed. You will need to run additional workflows to update
subscriptions that depend on this change
### Termination
The `terminate_site` workflow will take an existing and active site
subscription from an `ACTIVE` to a `TERMINATED` state. This requires all
dependant subscription instances to already be terminated. If this is not
the case, the workflow will be unavailable for an operator to run, accompanied
by an error message explaining this fact.
# Workflow Orchestrator
## Modelling and workflows
### [Sites](./sites.md)
### [Routers](./routers.md)
### [IP trunks](./iptrunks.md)
## Maintenance
## Troubleshooting
...@@ -36,13 +36,10 @@ This site is organized in 4 main sections: ...@@ -36,13 +36,10 @@ This site is organized in 4 main sections:
- [Architecture](./architecture/index.md): covers the architecture of GAP - [Architecture](./architecture/index.md): covers the architecture of GAP
including all the components and the interactions between them including all the components and the interactions between them
- [Legacy GAP](./legacy_platform/overview.md): provides operational guides - [Admin guide](./admin_guide/index.md): provides detailed information of
of the legacy GAP platform based on Ansible and Jenkins the domain models in WFO and all the Ansible mechanics
- [User guide](./user_guide/index.md): provides operational guides of the - [Workflows](./workflow/index.md): provides operational guides of the
Workflow Orchestrator based GAP Workflow Orchestrator based GAP
- [Admin guide](./admin_guide/index.md): covers the detail information of
the domain models in WFO, descriptions of the workflows, and all the
Ansible mechanics
The documentation provided in this portal is final and reviewed. For information The documentation provided in this portal is final and reviewed. For information
about the ongoing work please refer to the [internal wiki page](https://wiki. about the ongoing work please refer to the [internal wiki page](https://wiki.
......
# Deployment of a new router
# Overview
The current GAP is simple and its fundamental parts are:
- An <a href="https://gitlab.geant.net/neteam/network-automation/na-production/prod_network_inventory/-/tree/master"
target="_blank">Ansible inventory</a> stored in Git
- A set of
<a href="https://gitlab.geant.net/neteam/network-automation/na-production/prod_network_ansible" target="_blank">Ansible
playbooks</a> stored in Git
- An Ansible master instance to execute these playbooks
- A Jenkins instance to orchestrate Ansible
An overview of the platform is depicted in the following picture:
![GAP_overview](../assets/images/Legacy_GAP_diagrams.overview.drawio.png)
## Functionalities
Currently, GAP is capable of the following capabilities:
- Provisioning of nodes and IP trunks:
- Deployment of base configuration on a new router
- Deployment of a new trunk with metric=9000
- Insertion of a new router in the iBGP mesh
- Periodic checks of configuration:
- Verification of single stanza of configuration
- Others:
- Upgrade of Junos on single and dual routing engines Juniper routers
# About this section
The GAP user guide section aims to describe step by step the mode of operation of the automation platform so that engineers can follow it when in doubt.
The structure is simple: one sub-section per product and one page for each workflow.
# IP trunks
## Deployment
## Modification
## Termination
## Migration
\ No newline at end of file
# Router deployment
From a bird's-eye view, the process of deploying a new router in the network is as follows:
1. Manually configure the router such that it is reachable from out-of-band (OOB).
2. Upgrade the router to the most recent OS.
3. Deploy base configuration.
4. Configure trunks to connect the router to the network.
5. Update the protocol meshes (such as iBGP).
6. Promote the router to the production environment.
![Provisioning a router in WFO](../../assets/images/WFO_deploy_router.png)
*WFO provisions a new router by following the steps shown here.*
# Routers
## Deployment
## Termination
# Sites
## Creation
## Deletion
\ No newline at end of file
# Node provisioning
A node consists of router(s), a switch, and a terminal server. In general -- as laid out more extensively
<a href="https://wiki.geant.org/display/NETENG/001+-+Topology+and+physical+layout" target="_blank">here</a> (behind
login) -- a PoP consists of:
* One or two routers
* One switch
* One terminal server
Globally, the workflow for a new site is as follows:
1. Deploy terminal server:
1. Generate base configuration from GitLab
2. Ship the device to its location
3. Verify reachability and insert in LibreNMS
2. Deploy PoP router in a 'core' fashion
1. Rack it up and configure the hardware
2. Connect the router to the terminal server via both a console connection, and FXP
3. Deploy base configuration using GAP
3. Deploy PoP switch
1. Rack it up and configure the hardware
2. Connect the switch to the terminal server via both a console connection, and FXP
3. Deploy base configuration using GAP
4. Deploy the PoP interconnect between router and switch
1. Set up a physical connection between router and switch
2. Deploy configuration using GAP
5. Deploy IP trunks to connect the router to the rest of the network
1. Set up a physical connection
2. Deploy configuration using GAP
6. Update the iBGP mesh to include the new router, promoting it to an edge router
1. Deploy configuration using GAP
2. Using GAP, insert the devices in LibreNMS
In the context of the automation platform, the PoP interconnects mentioned are modeled as separate objects.
# Activate IP trunk
When the SharePoint checklist of a trunk is completed, this workflow is run to
take the subscription from `PROVISIONING` to `ACTIVE`. The operator is asked
to give a URL to the completed checklist.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment