K8Guard

K8Guard is Officially Open Source I am happy to announce that Target has open sourced K8Guard. I have been part of designing and developing it for the past few months, and I’m going to share a little more about it. What is K8Guard? K8Guard is an auditing system for Kubernetes clusters. It monitors different entities on your cluster for possible violations. K8Guard notifies the violators and then takes action on them. It also provides metrics and dashboards about violations in the cluster through Prometheus. How to Pronounce It? Like Kate Guard - the guardian angel for your Kubernetes clusters. Why?...

Measuring the Performance of our OpenStack Cloud

Here at Target, we run our own private OpenStack cloud and have never been able to accurately measure the performance of our hardware. This lack of measurement prevents the evaluation of performance improvements of new hardware or alternative technologies running as drivers inside OpenStack. It also prevents us from providing a Service Level Agreement (SLA) to our customers. Recently we have been striving to improve our OpenStack service which led us to talk to our consumers directly.

Target and Elasticsearch: Maintaining an ELK stack over Peak Season

One of the strongest benefits of launching an application into the cloud is the pure on-demand scalability that it provides. I’ve had the privilege of working with the ELK stack (Elasticsearch, Logstash, Kibana) for purposes of log aggregation for the past two years. When we started at that time, we were pleased with our performance on search and query times with 10’s of gigabytes of data in the cluster in production. When Peak time hit, we reveled as our production clusters successfully managed half a terabyte of data(!). During peak, Target hosted 14 Elasticsearch clusters in the cloud containing more...

Surviving (and thriving) Through Peak Season 2016 on the Digital Observability Team

It is 7:30 AM on a Monday morning in late October. I am waiting in line at Cafe Donuts to bring my team breakfast for our mandated ‘no work for one hour’. We just wrapped up a strenuous week of implementing a major upgrade to our Elastic logging cluster. Many digital teams are relying on this upgrade to position themselves to confidently monitor their application health during the most important day in retail - Black Friday. It was a successful, much anticipated upgrade that resulted in many hours of overtime, late night calls, and cross-team performance tests. Morale is high,...

Hadoop Rolling Upgrades

Hadoop upgrades over the last few years meant long outages where the Big Data platform team would shutdown the cluster, perform the upgrade, start services and then complete validation before notifying users it was ok to resume activity. This approach is a typical pattern for major upgrades even outside Target and reduces the complexity and risks associated with the upgrade. While this worked great for the platform team, it was not ideal for the hundreds of users and thousands of jobs that were dependent on the platform. That is why we decided to shake things up and go all in...

How (and Why) We Moved to Spinnaker

Background Just after the middle of last year, Target expanded beyond its on-prem infrastructure and began deploying portions of target.com to the cloud. The deployment platform was homegrown (codename Houston), and was backed wholly by our public cloud provider. While in some aspects that platform was on par with other prominent continuous deployment offerings, the actual method of deploying code was cumbersome and not adherent to cloud best practices. These shortcomings led to a brief internal evaluation of various CI/CD platforms, which in turn led us to Spinnaker. We chose Spinnaker because it integrates with CI tools we already use...