First off Knowledge is King
Second if its faster/easier/better to use home lab don’t stop 🙂 keep learning
That being said it’s beyond my understanding why more people don’t use shared labs
Pooling resources is what we been doing for the last decade and many more than that
If everyone is buying NUC’s that isn’t utilized all of the time why is that a good idea ?
If you got unlimited money to spend go on home labs keep going down that path
There are plenty of arguments against it
Red Tape
X broke my Lab now i have to rebuild
Y powered off my server during a demo
Company won’t pay for hardware
Company won’t pay for power/colo
We won’t use the people to maintain it we are a consultant company not a hosting company
Just use Azure/Ravello/AWS/Google Cloud/Air
If I leave company I loose access to lab
If each company that have consultants took 1 or 2 hours of billable time each month and “used” that for a shared lab i believe everyone would be better off.
And since many agree that pay isn’t always the first reason to join a company perhaps offering a access to a real playground will provide you easier access to talent, and if nothing else retain the people they have
So can a company really afford not to provision lab equipment for their staff?
We also have some good friends that help out sometimes and they get access to the playground, help being anything from hardware/software/time or just being good guys/girls
We are a small company (even by Danish standards) but we still have and maintain a rack where we have our gear for testing/playing/demoing
One of the reasons is that many (at least imo) don’t want hardware at home even if the company paid for it, its noisy and uses to much electricity and at least when we are using edge stuff the lack of multiple public ip addresses is a pain
If i didn’t have a decent lab i wouldn’t spend the time i have poking around
Waiting for slow hardware isn’t good for my mood
Fabric work is hard to do purely virtual, you need iron at some point (for now)
Forgetting to power off cloud usage is a pain
Forgetting to power on before a demo is a pain
Having to power on to test/check something is a pain
So what did we do
Got a rack at a colo with a 100mb/s connection not impressive but more than enough for playing around and 32 ip addresses , and ipv6 almost there
Bought old servers , if someone decommissioned a old server we bought that compared to them throwing it out
This meant our playground with very little cost went from a few old HP server to a fully stuff C7000 with 16 blades and now back to rack servers
The cost between brokers and used hardware been in the range of 1-2 hours pr person pr month , the old blades we gave away most of them as we didn’t have any use so they live on helping other places same for the C7000
Rack+Power is covered but we don’t power on servers that isn’t used so we try to be reasonable no point in having 5 host online that isn’t used for anything
Moving from 1g to 10g cost a bit for the switch and we dont have RDMA or 25/50/100 in scope but 10g is good enough for most of the testing as it isnt performance we are benchmarkin
And with S2D for Windows Server 2016 we ended up with 4 NVMe boards and 20 SSD again small sizes and not enterprise class but good enough for testing , and way better than simulating everything in VM’s
For storage we mostly have DAS , but we also have a few NetApp’s , one rented out for a customer and then returned so its “free” as the rent covered the base cost.
How does it look
We try to structure it in demo , playground and Reserved
Demo : Stable environment, controlled changes so it’s always ready for customer demo, no breaking demo allowed except for named VM’s for DR/Backup
Playground : Everything Goes, don’t except anything to be rock solid, still don’t break anything on purpose, send email if you does crazy stuff to warn
Reserved : Named Host / Part of Host / VLAN , dedicated to named “used”
And when I say try it’s because during the whole Windows Refresh Cycle rebuilds are more often than we prefer but it will be more stable as we go toward GA of the 2016 wave
And once in a while breaking access require heading onsite, (killed firewall with a upgrade night before holiday)
Learnings are DO NOT KEEP any production on any parts of the environment , separate compute/storage/firewall/ip Everything
We have some servers in the same rack that’s “production” but the only thing that are shared are the dual pdu’s , and we haven’t broken that yet
Licensing covered on trial mostly and a few NFR again back to refresh of whole Environments
What can’t we do currently ?
top of mind items for now
No NSX
No RemoteFX/OpenGL
No FC integration (currently)
No TOR integration (aka no Arista)
No Performance Testing (Consumer Grade SSD , No RDMA , 10G)
No Very Large Scale Testing (2k+ VM) with dedupe and delta disk we probably could but not full VM’s)
No Virtual Connect or anything fancy blade for other vendors but again don’t believe in blades anymore)
No hardcore fault domains
Wish List for near future
RDMA/ High Performing Storage
More Azure running aka balance between credits and features aka send more money
Wish List for 2018+
Everything in Azure
/Flemming , comments at [email protected] or @flemmingriis