Extending the vSphere MAC address space

Recently another TAM requested some infos on this topic in VMware’s slack, finally giving me the push to publish as I had this post for ages in my drafts. Given, it is somewhat of an edge case for only a hand full of customers and the problem itself is not new but here is my take. As always please do your own research before you put this untested into production ūüôā

Problem statement

If you have a high number of vCenters there is chance that you might end up with the same MAC addresses across different vCenter instances.

But the TL, DR: Don’t panic if you are not a big enterprise or service provider. And even if you are one, think twice and in terms of effort versus gains first.

Background

The default behavior for MAC address assignment to virtual machines (or rather network adapters) in vSphere is very well documented and I think everyone has come to know the famous prefix:

VMware Organizationally Unique Identifier (OUI) allocation assigns MAC addresses based on the default VMware OUI 00:50:56 and the vCenter Server ID. 

Source: VMware vSphere 6.7 documentation

vCenter Server ID

Let’s dig into this by starting with the vCenter Server ID. With the initial setup each vCenter gets a randomly generated ID that allows for 64 different values. You can find the vCenter Server ID of your current installation in the general settings of the vCenter

VMware Organizationally Unique Identifier (OUI)

The OUI covers 22 bits in a MAC address, in general identifying the vendor (it is actually a bit more complicated than that – but let’s leave it here for now). A complete list of all OUI is available from the IEEE (see further down). Taken from the same documentation page as before:

According to the VMware OUI allocation scheme, a MAC address has the format 00:50:56:XX:YY:ZZ where 00:50:56 represents the VMware OUI, XX is calculated as (80 + vCenter Server ID), and YY and ZZ are random two-digit hexadecimal numbers.
The addresses created through the VMware OUI allocation are in the range 00:50:56:80:YY:ZZ – 00:50:56:BF:YY:ZZ.

Source: VMware vSphere 6.7 documentation

In my case, all my virtual machines get a MAC from the 00:50:56:AB:YY:ZZ range. Counterintuitively, the specified offset of 80 is a hex number and the shown vCenter ID 43 is a decimal. Converting these number delivers us 80 hex == 123 dec and 43 dec == 2B hex which makes it a total of 123 decimal or AB in hex.

On a side node: About the number of VMs and available MAC addresses

The second thing people notice in the documentation is the limit of 64000 unique addresses per vCenter. Personally, I don’t think that is not an issue if you look at the scale limits listed in the VMware ConfigMax. As an example the supported numbers for vCenter 6.7:

vCenter Server Scalability
Powered-on virtual machines per vCenter Server: 25000
Registered virtual machines per vCenter Server: 35000

Locally administered addresses (LAA) – the alternative to VMware provided MAC addresses

Before we continue let’s revisit the the MAC address guidelines from IEEE (little did I know about this before this post). For context, an Extended Unique Identifier (EUI) with 48 bits (= 6 bytes) is a the base for an Ethernet MAC:

When an EUI is used as a MAC address (for example, an IEEE 802 network
address), the two least significant bits of the initial octet (Octet 0) are used for special purposes. The least significant bit of Octet 0 (the I/G bit) indicates
either an individual address (I/G=0) or group address (I/G=1), and the
second least significant bit of Octet 0 (the U/L bit) indicates universal (U/L=0) or local (U/L=1) administration of the address.
A universally administered address is intended to be a globally unique address.

Source: IEEE – Guidelines for Use of EUI, OUI, and CID

This means we have an officially sanctioned way to provide our own MAC addresses and the IEEE provides a bulky tutorial on how to handle these kind of things.
In essence, you must ensure that the second last bit of the first byte is set to “1” – as seen above the official docs call it the second least significant bit (X bit) of Octet 0. In a visual representation:

MAC address: 
1st byte | 2nd byte | 3rd byte | 4th byte | 5th byte | 6th byte | 
00000010 | ...
      ^
This bit must be set to 1 if you want to have a valid LAA.

No rule without an exception

Approximately 18 organizational identifiers assigned to early Ethernet
implementers (some of the BlockID assignments made prior to approval of
IEEE Std 802.3-1985) have the X bit equal to 1.

Source: IEEE – Guidelines for Use of EUI, OUI, and CID

Make sure you check your selected MAC address range against all officially listed assignments on the IEEE page. A un-official but comprehensive overview is also available from the wireshark team.

Implementation

With the infos above we can select a suitable, private MAC address range like:

02:xx:xx:xx:xx:xx (the first byte is: 00000010)
06:xx:xx:xx:xx:xx (the first byte is: 00000110)
0A:xx:xx:xx:xx:xx (the first byte is: 00001010)
0E:xx:xx:xx:xx:xx (the first byte is: 00001110)

Taking the 02:xx:xx:xx:xx:xx-namespace as an example:
In theory there are 256^5 addresses available (02:00:00:00:00:00 – 02:FF:FF:FF:FF:FF), compared to the 256^2 addresses normally provided by a vCenter (two remaining bytes).
With regard to the vSphere documentation you can select between the prefixed- or range-based allocation to assign the addresses. The major difference is basically how granular you want to specify the range.

For my example I will use the prefixed-based assignment for which two parameters in the vCenter advanced configuration need to be modified:

config.vpxd.macAllocScheme.prefixScheme.prefix
config.vpxd.macAllocScheme.prefixScheme.prefixLength

You can look up the details in the documentation but in short:

The parameter config.vpxd.macAllocScheme.prefixScheme.prefix defines a fixed prefix for an address, much like a IP-prefix (e.g. 10.x.x.x.x) while config.vpxd.macAllocScheme.prefixScheme.prefixLength defines which part of the prefix is variable, much like a subnet mask for an IP (e.g. /16)

Example

Let’s generate a scheme of <LAA prefix>:<CLOUD>:<REGION>:<VC number>:xx:xx which allows a setup with up to 256 different cloud environments with up to 256 regions each and up to 256 vCenter instances in each region. We use these values as an example:

  • LAA prefix: 02
  • Cloud = EE
  • Region = DE
  • VC Number = 01

The resulting advanced settings for this specific vCenter would be:

config.vpxd.macAllocScheme.prefixScheme.prefix = 02eede01
config.vpxd.macAllocScheme.prefixScheme.prefixLength = 32

This would assign virtual machines MAC addresses from 02:EE:DE:01:00:00 to 02:EE:DE:01:FF:FF. Please note that this will not work virtual machines that have been created beforehand.

Summary and gotchas

Setting a new MAC address prefix by changing two advanced options is not hard but every design decision has drawbacks. Let’s discuss a few that come to my mind:

It all starts with the address space, since you have the freedom to manage the addresses by yourself – you are also responsible for implementing a scheme where you ensure that no overlapping or conflicting ranges are defined. This all boils down to people, process and technology – or as the IEEE puts it:

Local addresses are not globally unique, and a network administrator is responsible for assuring that any local addresses assigned are unique within the span of use.

But not only the vSphere side needs to be considered in this address scheming, note that nearly all vendors use LAA for their virtual IP implementations. This means a company wide solutions is needed since you don’t want to find out the hard way which physical networking devices like load balancers use that range (as one example).

The second consideration is: Do you need it at all?

Apart from some ill-written software which has license restrictions based on the MAC address of a system (god, I hate these!) your MAC address should only be relevant for layer 2 communications, basically in up to the next router. Do you have more than 64 vCenters on the same campus communicating to the same set of switches before a router so that a duplicate MAC address might matter at all?

Overlay networking may change things a bit as people may think of it as a viable solution to extend layer2 between data centers, but that’s another set of problems.

A high-level overview: The edge in NSX-v vs. NSX-T

While bringing myself up-to-date with NSX-T I struggled with the terminology in relation to my existing NSX-v knowledge. There are quite a few awesome blog posts around -v or -T and I am trying to compare the “edge” at a high level.

I hesitated to make a public post out of it since I am myself quite new to NSX-T and my post may not be 100% correct, but then figured that I can just learn from the feedback. So here are my notes with the risk that someone else has done the same already and that my post is full of errors.

NSX-v

In NSX-v the term edge usually refers to the edge services gateway (ESG) although, strictly speaking, the control VM for the distributed logical router (DLR) is also an edge (but we leave that aside for now).

The edge (ESG) is always deployed as a virtual machine on a vSphere cluster and the main focus is the handling of north/south traffic as well as upper layer services. Therefore the ESG provides routing capabilities, dynamic or static, between your virtual and physical networking infrastructure and can offer load balancing, VPN, DHCP, …

The NSX manager takes care of the full lifecycle for the VM. To handle increased throughput or to provide sufficient resources to the networking services the edge comes in different sizing flavours.

An edge cluster in NSX-v is essentially a vSphere cluster that hosts the NSX edge VMs. That cluster must

  • be part of the NSX-managed vCenter
  • be connected to the NSX transport zones
  • have one or more VLANs are assigned to the physical NICs of the ESXi hosts for uplink connectivity to the physical network infrastructure

In this cluster the edge VMs usually peer with a physical router over a dynamic routing protocol. Apart from methods like active/standby edges or ECMP, the vSphere cluster contributes by vSphere HA to the availability of the solutions.

All traffic going to or coming from the physical network infrastructure therefore passes through the edges and hosts on this cluster.

NSX-T

In NSX-T it is a bit different:

The purpose of NSX Edge is to provide computational power to deliver IP routing and services.

This means the edge itself does not deliver the capabilities of an ESG, but rather provides the resources to run these functions as part of another instance (which we will discuss later on). So basically the NSX-T edge is like an ESXi hypervisor – just for network functions.

Edges in NSX-T therefore:

  • Provide compute capacity for instances that host network services and north-source routing
  • can be a deployed as a VM to a compute manager (vSphere only)
  • but edges can also be bare-metal instances (very strong design constraints)

When you think of bare-metal edges the following just makes sense, but it is often overlooked: Edges in NSX-T are transport nodes and therefore part of the transport zone. They do not need to be on an NSX-enabled cluster like in NSX-v.

A (very) simple diagram of the different termination points for the overlay tunnel in NSX-v vs. NSX-T

Now another aspect to look is the availability of the edge, this is provided by aggregating edge nodes into edge clusters. With NSX-T an edge cluster is therefore no longer a vSphere cluster that hosts the edges, but an aggregation of multiple edges to achieve availability and performance goals. And yes, you can have multiple edge clusters for various reasons ūüôā

And what is this instance the edge node provides the compute capacity for? Well, basically for stateful services as part of a service router (SR). And what is that now? Let’s look at a few terms that come up in the NSX-T world:

  • Logical router
  • Service router
  • Distributed router
  • Tier0- and Tier1-gateway
The network path from a VM to the physical network infrastructure (I omitted some elements for clarity)

In NSX-T a logical router is more like a concept or description not a function or a specific instance (I would argue that more often the combination of service- and distributed router is called a logical router). Two aspects of logical routing are the independence (decoupling) from underlying hardware (who would have guessed that from VMware?) and multi-tenant support. Implemented are these things by Tier-0 and Tier-1 gateways.

Tier-0 and Tier-1 gateways consist each of distributed router and service router (with an exception to the rule). The term Tier-0 and Tier-1 itself can, just to mention one aspect, be described as the relative position of the gateway in terms of infrastructure placement:

The Tier-0 gateway is closest to the physical infrastructure and often referred to as “Provider gateway”. As such the Tier-0 gateway takes care of the connectivity towards the physical network, either with static or dynamic routing. Therefore the Tier-0 gateway is in general used for north-south traffic.

Implementing a Tier-0 gateway requires an edge cluster with at least one node because we have elements here that cannot be part of a distributed router (i.e. physical uplink connectivity) and need a service router.

The Tier-1 gateway is the “tenant gateway” and must be connected to a Tier-0 gateway by a RouterLink, it cannot connect to the “outside” on its own. On the other hand, a Tier-1 gateway can only be connected to a single Tier0-gateway at a time.

A Tier1-gateway would not need an edge cluster unless we implement stateful services. However, if you select an edge cluster a SR will be created automatically, so think twice before doing this. The Tier-1 gateway handles mostly east-west traffic and is usually the default gateway for your virtual machines in the segments.

Now that we know of gateways what kind of routers are there?

A distributed router is that what you would refer in NSX-v as Distributed Logical Router (DLR), a router that sits in every transport node (most of the times ESXi host) and enables the routing at the source hypervisor. This concerns all your traffic within the NSX world, the traffic flow is not visible from the outside and it uses the underlay as transport between the nodes.

The service router handles all the stateful services that cannot be distributed:

  • Load Balancer
  • VPN
  • DHCP
  • DNS Forwarder
  • On Tier-0: Physical uplink connectivity

As mentioned above, these functions are instantiated on the edge. If you got a lab (read non-prod environment), you can actually try to log into your edge and run a “docker ps” to see them. Another often underestimated fact is that both tier-gateways can run a certain set of stateful services, not the same features but still a big thing if you are coming from NSX-v.

You can probably fill a book on the topic of Tier-0 gateway designs with or without ECMP, different options for availability and so on. Here are a few blog posts that I can recommend for further reading:

Closing words

The big thing is to get your head around these new concepts. Read blog posts, read the documentation, try it out. But once this is done you start to see the improvements for architecting an environment. Now you can draw a line between the infrastructure design (e.g. for management and edge) and the actual network design.

Some notes on vIDM in general and ADFS integration

Over the last week I had my first contact with vIDM and the fine task to integrate it with ADFS for a customer and to do some tasks around NSX-T in my lab. There were some caveats I fell into and this page is more like my online notes in case you stumble here with the help of google fu.

Here are the sources for reference for the ADFS integration:

The NSX-T integration with vIDM is described in great details by my colleague Romain Decker (who has some awesome content on his blog btw).

Access denied or how to force local login

When my configuration didn’t work after following the guides, I was unable to go back to the admin console because I was re-directed to my default authentication method. Then I was looking for a way to login against the system domain. Use the following URL to enforce this:

https://<vidm>/SAAS/login/0

vIDM authentication methods for an IDP

When you create an Identity provider (IDP), vIDM forces you to specify an authentication method. Both guide specify the classes

  • urn:oasis:names:tc:SAML:2.0:ac:classes:Password
  • urn:federation:authentication:windows

During our debugging session I learned from the ADFS folks that these classes are not an universal standard or defaults, but depend on what your provider has configured. Unfortunately, this is a mandatory field and hence you need to talk to your ADFS team first on what they expect from you. If nothing else helps, set this to

  • urn:oasis:names:tc:SAML:2.0:ac:classes:unspecified

Change the authentication method for an IDP

So you created an IDP but you made an error? You can only change the “authentication method” if it is not in use. Change the policy that uses this authentication method to another setting, make your changes and re-include your authentication method.

Debugging SAML messages

When I configured the ADFS integration it didn’t work and I didn’t know why. The way forward was to capture the SAML message and see which failure was thrown. AWS provides a nice summary on how to capture the SAML response in your browser here:

https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_saml_view-saml-response.html

One you got the content you can use this page to decode the response and try to make sense of it all:

https://www.samltool.com/index.php

Get the vIDM certificate thumbprint

There is an official page in the VMware documentation, however I found that you can shorten it down to this (with possible improvements to use OpenSSL from remote to reduce the steps further):

  • SSH to the vIDM host and log in as sshuser.
  • su root or sudo -s or whatever suits you to get root access
  • Change directory cd /usr/local/horizon/conf
  • Get the thumbprint: openssl x509 -in <FQDN of vIDM host>_cert.pem -noout -sha256 -fingerprint
Update 2019-08-01:

Find the vIDM debug logs

The bulk of vIDM log files is not in the standard directory /var/log

  • SSH to the vIDM host and log in as sshuser.
  • su root or sudo -s or whatever suits you to get root access
  • change directory to /opt/vmware/horizon/workspace/logs
  • If you need to increase the verbosity, edit /usr/local/horizon/conf/saas-log4j.properties

NSX-T: A quick glance at the new version 2.4 management/control plane of NSX-T

I worked quite a bit with NSX-v over the last years, but I am just coming to terms with NSX-T. The more excited I am to get my hands on the latest version (2.4) and try out the new way of handling the management/central control plane.

Status quo

For NSX-v and NSX-T prior to 2.4 the setup of management and control plane was like this:

  • A single VM for the manager who serves as API endpoint
  • Three controller VMs .
  • (Only on NSX-T) a policy manager VM.

Changes with version 2.4

Starting with this new version all mentioned functions (and hence appliances) are consolidated with the ability to create a management cluster.

If you want to get started, here are the most important links:

How do we get things going with 2.4?

As with prior versions the first step is to deploy the NSX unified appliance, since the dialog didn’t change much I will omit the screenshots and you can find the docs here. Just note:

  • that you will now need a 12 character long password, password managers FTW!
  • There is a new role “nsx-manager nsx-controller” for the appliance, reflecting the change.

After deployment and logging in, you will see the new UI which I really like as it is very clean.

Following the links to the NSX nodes you can manage your installation and that includes adding a virtual IP for your management cluster.

Setup of the virtual IP was as easy as typing the desired value in the text box after hitting “EDIT”. And as you can see above, I am already accessing my NSX UI by the virtual IP.

Break-out: The cluster status

The front-end is running on the second controller as I decided to break a few things on the first node (by intent *cough*). A click on “DEGRADED” will reveal what is up with my NSX installation, neat!

Adding a new node (UI)

With the first node in place, you want to add two additional nodes for a proper cluster. The UI provides you with a nice wizard to do so, obviously there are other, more automated ways of deployment but I leave that for another day.

To deploy nodes, a computer manager (vCenter) needs to be added to the NSX manager. This process hasn’t changed, so I didn’t include it here.

  • Note that you can provide a common set of information for all nodes here (as the dialog title “common attributes” suggests).
  • The appliance sizes are relevant for your targeted deployment, most SME will probably use medium where as bigger installations will go for large nodes. The NSX-T documentation as well as configmax will guide you for sizing.
  • Take that into account for your resource and HA planning that RAM for all nodes is reserved,
  • I cannot give a support statement here, but as you can deploy more NSX nodes and decomission old ones on the fly, I would image that you can swap out smaller nodes one after the other in a merry-go-round fashion if you hit a limit.

In my case I wanted to add just one additional node, but the dialog would also allow to deploy more nodes if needed

After that the node will be deployed and shows up in the UI afterwards.

Summary

I really like the new approach of managing the appliances. This means less admin overhead/complexity for the operations folks while providing an increased availability for the management plane.

Homelab 2019: Overview

As I started to build my own homelab, I thought why not share it. Some is still word in progress like the 10G NICs and I am debating with myself on some other networking gear.

What are my goals for the homelab?

  • Provide sufficient resources for testing, aim would be to host more or less the VMware SDDC stack (or components for the SaaS-flavor of our products in some instances)
  • Keep noise to a minimum, it must be “silent” from 0,3m away
  • Keep power consumption down, I want to spent less than a Euro per day
  • Re-use existing hardware where possible
  • All things must fit into my office in terms of space

On a high level my plan is to build a resource cluster to host my extended monitoring and management workloads (i.e. everything that is not required to run this lab) and some nested environments for the actual tests. As supporting infrastructure I have a standalone host that will be used for the core management components and co-function as a jump box.

In somewhat more detail it should look like this from workload distribution perspective:

and the final vCenter layout is going to look like this:

Resource cluster

The question is always how to build the lab in the first place:

Small boxes

Smaller nodes like NUCs have a cheaper price point per unit for the box itself but will max in resources fairly quickly. Here the price goes up through the number of devices required and personally I saw two drawbacks:

  • each device adds more overhead to my environment (cabling, cooling, system overhead)
  • Scalability is very limited, basically you can just scale-out which again adds overhead and can be a PITA if you are trying to use for instance vSAN and need just one resource.

Big boxes

Okay, now small boxes didn’t fit my need – what about “big boxes”? Real big boxes like servers can pack in tons of resources but

  • they would not meet the requirements of being quiet
  • I expect the power consumption would be too big
  • They take up a lot of space

The middle ground (BYO)

As I couldn’t find a suitable product, the supermicros seem not to be easy on the ears, I did go the “build your own” route. Here I opted for the middle ground with a decent amount of RAM but still consumer hardware which can be tweaked to lower power consumption and nose emission. Finally, this is what I ended up with:

In terms of RAM I opted to go with 64 GB hosts as this reduces the overhead per system and allows for a higher consolidation ratio. Downside of this: it drives the cost per host if I ever want to expand to a third node.

With six CPU cores per host a decent amount of resources are provided so I can easily enable things like vSAN dedup/compression and still have enough CPU power left to host some VMs on top of nested hypervisors

I had the choice of a NAS solution or vSAN as HCI for the storage layer. I opted for vSAN/HCI as it allows me more flexibility on how to assign resources (e.g. I would not need to provide extra FTT to a nested ESXi) and I can just add capacity disks as needed. Major downside is the extra resource consumption per host.

List of components

So what does a host look like in more detail?

  • CPU: 1* Intel Core i5 8400 6x 2.80GHz
  • RAM: 4* 16GB G.Skill Aegis DDR4-2400 DIMM¬†
  • Mainboard: GIGABYTE Q370M D3H GSM PLUS
  • NIC (PCI1): <not yet decided on>
  • ESXi boot disk: 1* 120GB Kingston A400 2.5″ SATA
  • vSAN cache device: 1* 250GB Samsung 970 Evo M.2 2280 PCIe 3.0 x4¬†(Node2 has a WD black 250GB NVMe)
  • vSAN capacity device: 1* 512GB SanDisk X600 2.5″ SATA (Node 2 has a SanDisk Ultra 3D 512GB)
  • Power Supply: 1* 400 Watt be quiet! Straight Power 10 Non-Modular 80+ Gold
  • Case: Thermaltake¬†Core¬†V21
And of course a photo of it all, excuse me for the mess in the other compartments of this fine pice of IKEA furniture ūüôā

Standalone host (existing)

The idea is to leave the management host on all the time, so the workloads deployed there should be kept to a minimum but still they would need to be the essentials that is required to start the rest of the lab.

Fewer hardware components in the box keeps power consumption to a minimum and at the end of the day I need to pay bills, so I decided to stick with what I got in terms of investment.

I also opted to use my NUC in a workstation setup (that not ESXi as OS) so I can have a jump box with my tools on it – makes life so such easier.

List of components

  • System: 1* Intel NUC-Kit i3-7100U 2.4GHz HD620 NUC7I3BNH
  • RAM: 2* 8 GB
  • Boot disk: 1* 512GB Samsung Evo 840 SATA
  • Disk: 1* Crucial MX500 512GB M.2

Summary

This is just a first overview, as I build this stuff over the next weeks I try to add some more content on networking and storage as well as deployment/management of the components.

VCDX: Some thoughts on requirements

What is this about?

This blog post has not the intend to define what a requirement/constraint is, there are already very good posts out there. Lately I pointed a lot of people over to Jeffrey Kusters who did an excellent job of summarising it but I included also two other good links:

What is this about?

If you are like me, coming from a pure technical background, the conceptual model of the VCDX exam proves to be the hardest part – especially since your journey starts with it and there is no shortcut around it (Rene gives good advice, as always https://vcdx133.com/2014/07/09/vcdx-how-do-i-measure-if-my-customer-requirements-are-being-met/)).  Think of the conceptual model like the foundation of a house, if it is not solid everything you build on top of it will collapse eventually (btw: It happened to me, forcing me to re-write on more than one time. Pro tip: don’t be like me on this).

Summing it up: Ideally this post forces you to realise that you need to invest time into the process of learning on how to develop a conceptual model and, with as the post has a focus on requirements, learn that they do not come out of thin air. There is actually a whole field called “requirement engineering” which has the target to gather and formulate requirements – just to give you a feeling for the relevance of this topic.

Requirement engineers have methods and techniques which you can study. Use this to build an understanding of what is relevant and how it is done. Try to apply this by formulating some solid requirements for your VCDX process (and keep that knowledge for your future projects).

Where do I start?

You know, there is always google ūüôā

Seriously, (solid) requirements are needed in a lot of places, one of my favorite reads is provided by the NASA.
They have a whole book online, for free – start looking at chapter 4, the System Design process: https://www.nasa.gov/connect/ebooks/nasa-systems-engineering-handbook

Do you need to read all of this and how does this all apply to VCDX?

Heck, no you don’t need all of this! But start digging into it and you learn some good stuff – they key here is building an understanding to know why requirements are important. For instance, I like the “TABLE 4.2-1 Benefits of Well-Written Requirements“.

Also, did you consider to try to talk to people who deal with requirements on a daily basis? Do you know any project manager or software developer/architect? They might be more than happy to help you out.

Can you sum it up, what does it mean for my VCDX document?

I cannot give you a definitive answer but a few personal opinions:

  • Write a requirement like the stakeholder investing money, not like the tech nerd you are (I include myself here).
  • Don’t focus on the implementation and do not make a hidden design decision out of a requirement: Focus on what the system/infrastructure needs to achieve, not how.
  • Did you test if other people understand your requirement? Ask around, also among non-technical people. Does everybody expects the same when reading your requirement?
  • For the majority of requirements, do not use subjective adjectives, e.g. what do you mean by fast storage? People might have different opinions on that.
  • Going in the same direction as the bullet point above, can you validate your requirement in any way? (Yes, this is one reason why there is a validation plan in the VCDX)
  • Be specific, set scope and expectations: Like when you include growth in percent, is it measured from your baseline or a “year over year”-value? For how many years do you need to plan? Which areas (compute, storage, …) do you need to consider?
  • Avoid any mis-interpretation with negative requirements, e.g. must not do X or Y. The “not” might be easily overlooked and there is still room for the question what the design must do.

On the topic of how much meta-data a requirement needs, I had a table with the following information:

  • Unique ID: Allows you reference the requirement in your design
  • Description: The main matter of a requirement.
  • Design quality: More for my sake to ensure I got everything covered
  • Issuer: Who signed off on the money going into this requirement?

I won’t say it is perfect but it did the job and it may be a good starting point if you haven’t considered anything in this regard.

The end

This is not much but I hope it points candidates into the right direction. I am always open for discussion and feedback, hit me on twitter if you like!

Disclaimer: Honestly, I feel like an imposter for writing this, constantly debating with myself if I can dare to put this out into the wild as I feel that my own stuff was not stellar. However, with some support from Bilal and Chris I decided to go for it. After all, it is a topic most candidates struggle with and I was no exception.

VCDX basic skills: the whiteboard

Disclaimer: I am by no means an expert on this topic nor have I got the feeling that my performance during the VCDX defense was exceptional. However, some basic training got me from “zero” to at least one step further – more¬†or¬†less¬†readable¬†handwriting.

We all take it for granted – there is a whiteboard in the room and we all know how it works, right? In any kind of meeting, when there is an idea in the room, you need to take notes or need to visualize something – you start drawing and writing on the whiteboard. Then one of the next sentences is often “sorry for my handwriting” or “I hope you can read this”.

I was in the same position and it felt somehow “unprofessional”. In the weeks before my VCDX defense with the design scenario looming over my head (which relies heavily on the whiteboard) I wondered if I cannot learn or improve my skill set in this regard.

Guess what – thank you, internet! There is nothing you cannot find on the internet, you just need to know for what to look for.

The tool: Whiteboard marker

Another thing we do tend to ignore are the whiteboard markers (except when they are dried out, then we start swearing). But did you ever wonder why there are different tips? In the picture you see the chisel-tip (top) and the bullet/round-tip (bottom).

Different tips serve a purpose

Chisel-tip: Often between 2 and 5 mm. The best use-case for this marker is writing. Vertical lines are slender and horizontal lines are bold. This makes your writing good readable, especially from further away

Bullet/round-tip: Around 2 mm. Use this for drawing. Your line width will not differ by direction and you can do more details.

Chisel on top, bullet-tip bottom.
  • VCDX-tip #1: Bring your own markers into the room. You do not want to deal with a dried-out pen
  • VCDX-tip #2: You won’t be able to switch between different marker-tips in the room, time is of essence and you don’t want to start fumbling around. Just practice and stick with one type of marker. Which one you need to find out for yourself.

If you want to go to expert level on different brands, see here. Personally I didn’t want to spent a fortune and the box that comes with the Staedtler pens is quite handy.

The overlooked: Handwriting

If you can spare a few minutes, read this blog post by Yuri Malishenko for a great overview on how to improve. For me this was really the starting point for all further efforts.

If you want to have a TL, DR at this point: Learn to write in CAPITAL BLOCK LETTERS, no one can read the stuff you learned in school. Now you do not have a whiteboard at home you say?

Flashcards can be used to improve your writing and to help you remembering information

I got back into the habit of writing flash cards. With the right pens (in my case 1 – 2,5 mm chisel-tip pens) you can practice your capital block letter-writing and at the same time memorize important information.

Another aspect is the size of your handwriting, again referring to an excellent article by Yuri Malishenko – use rule #5 to determine the size.

TL, DR: As a general rule of thumb, write a letter about 3 cm high. That is as high as the index and middle-finger put together. E.g. if you write with the right-hand, use the fingers of the left hand to determine the size of your letters.

Often forgotten: T-T-T

A golden rule of whiteboarding or on the flipchart is:

  • touch
  • turn
  • talk

Please, do not talk to the whiteboard. Now repeat: Do not talk to the whiteboard!

Touch: When you are finished writing or drawing, touch the point on the whiteboard you want to highlight/talk about

Turn: Turn to the audience and make sure you got their attention

Talk: Now you can start explaining.

Summary

All of this takes some time to learn and personally, I keep improving all the time. Perhaps this blog might help you preparing for VCDX by giving you some confidence in an area you did not consider so far.

VCDX: Thank YOU

On December 13th, 2018 I received an email: 

For me this marks the end of a chapter which I would call the VCDX journey and I have to thank many, many people for supporting me along the way up to this point.

Easily the longest supporter of my efforts has been Bilal Ahmed. In his always good natured way he managed to guide me since VCAP design-days with solid advice and motivation. Alone in the last weeks before the defense he did go out of his way to connect me with mock panelists so I could refine my presentation over-and-over in the final days before going in.

Yet another important person is David Pasek whom I contacted right after joining VMware to ask if he would mentor me. I guess without him I would still be editing and redoing my document. David is not only a sheer endless source of knowledge but has also the great gift of cutting through the noise and focus on the important parts, always able to get me back on track.

Also, thanks a lot all to other VCDX mentors who helped me along the way, like Paul Meehan, who is not only a great guy but has tons of knowledge to share and always motivated me to keep pushing. Paul McSharry who, before becoming a panelist, did a ton for the VCDX community. Per Thorn had always high quality and in-depth answers. Gregg Roberston for doing all the work with in-person mocks in the UK as well as the slack channel which both are vital for future VCDX candidates. Update: Damn, I forgot to mention Ben Mayer, in the time after submitting the docs he helped me with multiple scenario-sessions and valuable advice for the presentation.

A special thanks to Manny Sidhu for getting up at 5 a.m. (!) to attend one of my mocks and many more who donated their spare time (like Shady). The “closing call”, the last mock defense I had, was actually only about 12 hours before going into the room featuring a panel of Kiran Reid, Jason Grierson, Bilal and the future VCDX #273, Kenneth Fingerlos (to be honest, this session left me a bit shaken but it was great with some valuable lessons).

Also, here is one shout out to my favorite slack group with guys like Bilal, Kyle Jenner Chris Porter and Mat Jovanovic. It is always a great mixture of banter and solid knowledge exchange with you guys. Chris also organised a mock session during VMworld Barcelona which was a dire-needed wake-up call for me to get on with my presentation (thanks everyone who attended that session in BCN).

During all the time I was fortunate enough to have support from my employers (current and past). At VMware from Matthias Diekert, who without batting an eyelid, offered me full support by taking over travel and expenses. At my former workplace the CEO and team lead supported my efforts, too.

Last but not least … the family. Man, they say you can’t do VCDX without the family and they are right. With a second kid in late 2017 and a job change in mid 2018, VCDX was no fun in the spare time, often leaving me only the hours between 10 p.m. and 1/2 a.m. for my work. My partner supported my all the time, either by “kicking my ass” to get up and start writing/studying again or by taking the kids for a weekend out on the days before the deadline ended – just so I could work all day (and night) to finish it.

What’s next?

VCDX was a time and resource-intensive process, at least for me. Getting back to a more normal work/life balance with the notion of picking up some sports again is one of my goals for 2019.

But being in IT, you cannot stay in one spot and from a professional perspective I fell behind on my training schedule (yes, I keep one for myself as part of the goals I want to reach. If you don’t do this, perhaps Melissa might change your mind). Next priorities are to catch up with public cloud, some sort of automation and also very specific with NSX-T.

Perhaps I make it to a VMUG and find a topic to present. I always wanted to do it but so far I do not know what to talk about. Also some more blog posts wouldn’t harm, so there is another thing to do.

Recovering the VCSA on a vSAN cluster

Disclaimer: The credit for the answer goes to John Nicholson (http://thenicholson.com/) a.k.a. lost_signal from the VMware SABU and I added some points.

As I am going through my physical design decisions, I came across a simple question for which I couldn’t find an immediate answer:

How can I restore my vCenter instance (VCSA) if I put in on the very same cluster it is supposed to manage? Can I restore directly on vSAN via an ESXi host?

As my google-Fu let me down, it was time to start a discussion on reddit:

vSAN question: Restore VCSA on vSAN from vmware

 

TL,DR: The good news is: Yes, you can recovery it directly and with 6.6. vSAN clusters this is straightforward with no prerequisites. Look into the vSAN Multicast Removal-guide for the post-processing steps.

As there are other aspects you generally need to consider (not only for vSAN),  I decided to summarize some basic points  (for 6.6 and onward clusters):

  • First things first, make a backup of your VCSA on a regular schedule along with your recovery objectives.
    • If you are on vSAN you should look for SPBM support in your selected product: the good¬†if you have support, the bad¬†if you don’t have it
  • Create ephemeral port groups¬†as recovery options for the VCSA and vSAN portgroups
    • This is not vSAN specific but should be generally considered when you have the vCenter on the same vDS it manages
  • Make a backup of your vDS on a regular basis (or at least after changes)
  • Export your storage policies
    • Either for fallback in case you make accidental changes or for reference/auditing purposes
    • You might need them in case you are ever forced to rebuild the vCenter from scratch
  • John pointed out that a backup product with “boot from backup”¬†capability (e.g. Veeam Instant restore) doesn’t need raise the initial question at all as an additional (NFS) datastore is mounted.
    • A point from myself: Verify the impact of NIOC settings if you followed the recommended shares in the vSAN guide for the vDS. The NFS mount uses the management network-VMK interface which is quite restricted (note: that this would only apply if you have bandwidth congestion anyway).

I would be more than happy if anyone is willing to contribute to this.