With the rapid growth in network traffic, network operations teams need to find a way to keep their infrastructure agile and robust. Long gone are the days of network changes taking days or weeks to complete. The best way to meet these ever-changing business requirements is to pair a network operating system (NOS) that has automation as one of its foundational principles, along with a robust network automation framework. In this blog, I will walk you through setting up an entire network automation pipeline utilizing Ansible and GitLab to provision an ArcOS EVPN VxLAN fabric.
Before we jump into the set of tools needed to make operator workflows more dynamic, it is good to define a core set of attributes that any useful automation framework should have:
The good news is that those above goals are not unique to scaling and building robust network architectures. These are similar goals that were top of mind when the DevOps methodology and culture started to take hold in IT organizations. The culture shift that DevOps requires – increased communication among teams, ability to iterate quickly, automate testing, making process replicable (to name a few), are the exact same principles that network operators need to embrace to successfully migrate to this new automated paradigm. To pair with that culture change, NetDevOps also employs the following concepts
The ArcOS architecture allows for an easy transition from the traditional CLI-based configuration approach to that of an automated workflow. Its OpenConfig based data-model has a consistent API for all northbound interfaces, giving operators flexibility in their deployment workflows. ArcOS supports full config parity across all programmatic interfaces, including NETCONF/RESTCONF, python-based APIs, and open-source NetDevOps tools sets.
While, there are a lot of different toolsets to choose from in the NetDevOps world, Ansible is the most popular for network configuration management. Ansible’s popularity is due to a few fundamental design choices, including but not limited to:
Having deployed many of the DevOps tools in production environments, I have found that Ansible is the easiest to operationalize. Ansible is also easier to phase-in to an existing environment allowing for quick automation wins.
The ArcOS Ansible integration leverages Debian/ONL kernel that is deployed with ArcOS, allowing for ArcOS devices to be provisioned like a compute node. With most other Network Operating Systems, as shown below, an operator would have to manage two or more different set of connections – one for the compute infrastructure and another for the network devices, which results in overly complex playbooks.
Now compare that with the ArcOS Ansible modules shown on the right, which leverage the default Ansible connection attributes. This ‘first-class citizen’ approach gives the operator the ability to drop-in ArcOS modules into existing compute playbooks and to extend tasks to the network infrastructure very efficiently and seamlessly.
There are two main ArcOS modules,
In this section, I will walk you through deploying a full NetDevOps pipeline to configure a full ArcOS topology, using open-source toolsets while adhering to the goals stated earlier.
In this example, I will be pushing all the needed configuration for 3 leaf nodes to be active in this EVPN VxLAN topology. The source of truth will be git (specifically GitLab in this case) and all configurations will be generated from a simple set of YAML files and pushed out to the devices via Ansible.
Let’s first examine how the configuration playbook is laid out:
--- - hosts: leafs gather_facts: false vars: load_operation: merge roles: - role: arcos-system tags: system - role: arcos-bgp tags: bgp - role: arcos-l2evpn tags: l2evpn - role: arcos-l3ints tags: l3ints - role: arcos-l2ints tags: l2ints - role: arcos-l3vrf tags: l3vrf - role: arcos-evpn-global tags: evpn --
The configuration playbook relies on Ansible roles to make the playbook flexible. Using Roles, aside from being an Ansible recommendation, make it easy to include or exclude specific configuration aspects depending on the workflow. Each of the roles shown above has a very similar architecture, therefore we can use one as an example. I am picking arcos-l2evpn role and here is the task list for that:
--- - name: Push new L2 VRF Candidate config template: src: 'templates/l2vrf.j2' dest: '/tmp/.{{ inventory_hostname }}.xml' check_mode: no - name: Apply candidate config arcos_config: src: "/tmp/.{{ inventory_hostname }}.xml" load_operation: '{{ load_operation }}' comment: "{{ comment | default('') }}" register: arcos_load - name: Remove temp file file: state: absent path: '/tmp/.{{ inventory_hostname }}.xml' check_mode: no - name: Remote old diff file file: path: "{{ playbook_dir }}/{{ inventory_hostname }}.txt" state: absent check_mode: no when: ansible_check_mode delegate_to: localhost - name: Write diff to file blockinfile: block: "{{ ('\n').join(arcos_load.message.splitlines()[1:-1]) }}" dest: "{{ playbook_dir }}/{{ inventory_hostname }}.txt" create: yes marker: "" check_mode: no when: ansible_check_mode delegate_to: localhost
There are essentially 3 key steps here (all executed locally on the ArcOS node)
It is important to note that the role is generating an XML encoded configuration to be loaded in the ArcOS configuration daemon. XML encoded files are more efficient files for the configuration daemon to process and allows for a cleaner updating running configuration. We can abstract this detail away from the network operator by providing a template interface for the configuration. In this case, the arcos-l2evpn role will render the vlans list from the leaf group_var file, which is a very easy to read YAML file, into the correct XML encoding.
vlans: - id: 10 state: present - id: 20 state: present - id: 30 state: present - id: 40 state: present - id: 50 state: present - id: 200 state: present
Using this approach, the same XML template will be used for each configuration push, ensuring predictable results. Each role shown in the main playbook follow this same structure, with the only difference being which variable file it will be using to render a candidate config. For example, the arcos-bgp role will be using a host_var defined BGP array since each node will have unique values for the BGP config:
bgp: as: 65001 router_id: "1.1.1.1" address_families: - name: IPV4_UNICAST networks: - "1.1.1.1/32" - name: L2VPN_EVPN ecmp: 32 neighbors: - ip: 2.1.1.1 peer_group: spine-evpn - ip: 2.1.1.2 peer_group: spine-evpn - ip: 192.168.0.1 peer_group: spine-underlay - ip: 192.168.0.7 peer_group: spine-underlay peer_groups: - name: spine-evpn as: 65100 local_address: "1.1.1.1" multihop: 5 address_family: "L2VPN_EVPN" - name: spine-underlay as: 65100 address_family: "IPV4_UNICAST"
The YAML data models show here are just a suggestion. They could be easily modified to fit a different source of truth or templating structure.
Ansible’s built-in features of templates and well-defined host attribute structure, makes it easier to write automation that are consistent, repeatable and error-free. This will allow the operations team to complete those dreaded weekend change windows and still have time enjoy their weekend.
With the source playbooks complete, the next step is to build out the NetDevOps CI/CD pipeline:
This pipeline is executed using Gitlab’s CI/CD environment which provides a single tool for source code/configuration repository and CI/CD pipeline executor. While alternative tools exist, converging on Gitlab allows us to limit the number of tools involved thereby simplifying the overall design.
Let’s examine the CI/CD pipeline stages:
stages: - test - confirm - deploy - validate
As we traverse this pipeline through its 4 stages, a stage doesn’t run unless the prior stage is successful. The test and confirm stages get executed each time a commit is pushed to any branch, whereas the deploy and validate steps will only happen on a commit or merge into the master branch. This allows to run tests on many branches deploy the changes in a more controlled fashion using a single protected branch.
The test stage will consist of 3 steps:
config_test: stage: test script: - curl http://10.0.2.2:5000/vagrant_up - ansible-playbook -i ansible_inv arcos-evpn.yml -e comment=” [$CI_COMMIT_SHORT_SHA]" --limit test_lab - ansible-playbook -i ansible_inv validate.yml --limit test_lab tags: - evpn rules: - if: $CI_COMMIT_BRANCH != 'master'
A quick note on step3 – Using Ansible both for the configuration push and validation allows us to limit the number of toolsets in the pipeline in an effort to meet the goal of keeping things simple.
- hosts: leafs gather_facts: false tasks: - name: Get lldp neighbors arcos_command: command: 'show lldp interface *' register: lldpneigh - name: parse neighbor map set_fact: is_correct: "{{ lldpneigh['message']['data']['openconfig-lldp:lldp']['interfaces']['interface'] | parse_arcos_lldp( inventory_hostname) }}" - name: check neighbor map assert: that: is_correct[0] - name: Grab BGP peers output arcos_command: command: "show network-instance default protocol BGP default all-neighbor | select state session-state | select state local-as " register: bgp_output - name: parse BGP output set_fact: bgp_neighs: "{{ bgp_output['message']['data']['openconfig-network-instance:network-instances']['network-instance'][0]['protocols']['protocol'][0] }}" - name: verify number of BGP peers assert: that: bgp_neighs['bgp']['arcos-openconfig-bgp-augments:all-neighbors']['all-neighbor'] | count == 4 - name: verify RIB arcos_command: command: 'show network-instance Tenant-A rib IPV4 ipv4-entries entry {{ item }}' register: l2rib loop: - '11.11.11.11/32' - '22.22.22.22/32' - '33.33.33.33/32'
In the example shown above we validate the following, we:
After the virtual topology has been configured successfully the above validate playbook successfully execute each task, the CI/CD pipeline will call the confirm step. The confirm step is meant to generate a human readable config diff that will be applied after all the templates have been rendered.
Here is the pipeline configuration for this step:
config_confirm: stage: confirm script: - ansible-playbook -i ansible_inv arcos-evpn.yml --check --diff --limit production tags: - evpn artifacts: paths: - ./*.txt rules: - if: $CI_COMMIT_BRANCH != 'master'
The key part to this step is calling the ansible playbook with the –check and –diff flags. The ArcOS Ansible modules conform to Ansible check_mode by applying the candidate configuration to the system but not committing it. Instead the output of ‘show configuration diff’ is returned to the playbook. We are also utilizing Gitlab’s artifacts feature here and storing these config diffs for each hosts. This provides a convenient way for the neetwork operations team to look at the proposed config before it gets pushed into production in the next stage of the pipeline. If you are trying to rollout a network change on a Friday evening, you will appreciate the benefits of this. For example, the config diff for the arcos-evpn-global role for leaf1 in this case looks like:
+evpn anycast-gateway-mac aa:aa:aa:aa:aa:aa +evpn duplicate-mac-detection window 60 +evpn duplicate-mac-detection threshold 7 +evpn duplicate-mac-detection auto-recovery-time 5 +overlay local-tunnel-endpoint 0 + source-interface loopback0
Once the confirm stage is completed and the candidate branched is merged into master, the third step of the pipeline is started
config_deploy: stage: deploy script: - ansible-playbook -i ansible_inv arcos-evpn.yml -e comment="[$CI_COMMIT_SHORT_SHA]" --limit production tags: - evpn rules: - if: $CI_COMMIT_BRANCH == 'master'
This is the same Ansible playbook that was used in the test stage, but just run against a different group of devices. One other nicety that Gitlab provides is an environment variable that matches the commit hash of the given commit. That hash string can be passed into the arcos_config module’s comment parameter allowing it to be referenced in the devices commit list:
root@leaf1# show configuration commit list 2020-05-07 03:11:15 SNo. ID User Client Time Stamp Label Comment ~~~~ ~~ ~~~~ ~~~~~~ ~~~~~~~~~~ ~~~~~ ~~~~~~~ 1 10008 root cli 2020-05-02 04:57:17 [762ede65] <-- commit hash value 2 10007 root cli 2020-05-02 04:57:14 [762ede65] 3 10006 root cli 2020-05-02 04:57:12 [762ede65] 4 10005 root cli 2020-05-02 04:57:08 [762ede65] 5 10004 root cli 2020-05-02 04:57:04 [762ede65] 6 10003 root cli 2020-05-02 04:57:01 [762ede65] 7 10002 root cli 2020-05-02 04:56:58 [762ede65]
The final stage, validate, is the same Ansible validation playbook that was run against the virtual topology, this time executed against the production devices:
validate: stage: validate script: - ansible-playbook -i ansible_validate_inv validate.yml tags: - evpn artifacts: paths: - ./*.png rules: - if: $CI_COMMIT_BRANCH == 'master'
By using just Gitlab and Ansible we were able to use the NetDevOps concepts discussed earlier to realize a network automation pipeline with ArcOS. By leveraging these open-source tools, the network operations team can focus on delivering a streamlined set of configurations that are stored inputs to a consistent, repeatable, configuration process. This ultimately allows existing network infrastructure to change as rapidly as the business requirements demand.
Check out the following demo video showing this pipeline in action:
Contact us to learn more about how we can help start your automation journey.