(Data) Science or Witchcraft?

Demystifying Data Science and Engineering

On my first encounter with it, around early 2010’s, I was mystified. It sounded like witchcraft and I imagined the practitioners to be a coven of witches and wizards, all holding Ph.D.s in the dark art of “Data Science” and being respectfully addressed as “Data Scientists”. It was believed they would magically transform haystacks into gold and then ask for your first-born in return as a reward for their service (a la Rumpelstiltskin) There is no denying the fact that the title “Data Scientist” is the most coveted one these days and has a nice ring to it. It’s also true that data science has traditionally been a monopoly of mathematicians and statisticians. Obviously, developing statistical models and machine learning algorithms requires years of training and practice to specialize. In my opinion it is more of an art form driven by science and can easily be mistaken for magic.

It’s common knowledge that the more experienced in life we get, the easier it is for us to make up our mind. For instance, “What diner to pick for a boy’s-night-out?”, “When to stay off highways to avoid being stuck in a traffic-jam?”, “When to buy a house? When NOT to buy?”, are all such decions we make everyday. This ability comes as result of years of learning from implicit experience (a.k.a unsupervised learning) and explicit instructions from parents, teachers, friends, family and media (a.k.a supervised learning.) Our brain builds models of the world, of the situations we have been in, of banal and extraordinary, of nice and not-so-nice, of appropriate and inappropriate etc. These models facilitate judgment, govern behavior and enable anticipation of likely outcomes. That’s basically data science. The recent progress in large scale and high performance computing has opened doors for such complex calculations to be performed on-demand and much more efficiently than was possible before. Hence, the buzz!

At Target we operate in a guest-centric universe. We don’t treat our guests as a statistic or as just a data-point in a trend. Our guests are setting their own micro-trends. Our focus is on carefully picking signals from our guests and learn what actually matters to them. Yes! We are seriously trying to understand each single guest’s needs independently. Therefore, we strive for a fully personalized experience and not just making suggestions on what is popular out there. As John Fairchild has allegedly said:

“Style is an expression of individualism mixed with charisma. Fashion is something that comes after style.”

At Target, Data Science and Engineering is a group that has gone beyond the conventional boundaries in terms of scale and applicability. To us the journey to the ultimate goal of 100% personalized experience began with the attention to detail, like what would our choice of algorithms be, how would we organize the data, what level of pre-processing would be enough, the frequency of compute cycles, strict SLAs on API exposed and so on. We were very clear that we needed to provide a consistent experience before worrying about fully personalized experience; establish reliability before making bunnies appear out of a hat. And our motto has been pretty simple, no matter what channel our guests take to interact with us we want their experience to remain consistent and pleasant.

What we do is driven by non-trivial problems that mandate an ensemble of approaches to be used. It is a slow and deliberate process of trial-and-error, experimentation with bleeding-edge algorithms/technologies and at times engaging in dialogue with peers and stake-holders, that can run for days. We have played with off-the-shelf algorithms as well as built new techniques. It is such a delight to witness the speed with which new ideas emerge from the insight built through models that use big data as input. We can understand our guests’ behavior and needs better because of our omni-channel retail capabilities.

“ It is such a delight to witness the speed with which new ideas emerge from the insight built through models that use big data as input.”

Data science and engineering teams that are tasked with solving business problems with quick turn-around times have a burning need to pre-process, ingest, store, process and retrieve large amounts of data fairly quickly and without compromising on security, quality or privacy. In the first year we have mostly focused on the design of long-running data pipelines, multi-phase compute workflows, highly scalable API layer and monitoring capabilities all aiming for the fastest turn around time. To name a few techniques, we have used Collaborative Filtering for behavior based recommendations, TF-IDF for feature selection, K-Means for clustering. Our exploration has a wider range though, as we are deeply interested in commoditizing data science for all our product teams within Target.

A key element in our ability to execute has been our choice to go with OpenSource almost every single time we had to make a technology decision with only a few exceptions here or there. We also rely on the contemporary Software Engineering and Management principles proposed by Agile development model, which has served us very well. Our engineers have a very wide variety of skills. Statistics, Machine Learning and Visualization techniques, delivered through Java, Scala, Python, Spark, R, Hadoop and many more technologies. That’s how we transform abstract ideas into practical and well-engineered products.

We are doing some kick-ass engineering with the focus to build value-driven products through technology. And that’s what data engineers do. For us, no idea is too big to try, and a failure is just a null-hypothesis we are trying to reject. Our dream is to go where no one has dared to go before and bring back the riches.

“No idea is too big to try, and failure is just a null-hypothesis we are trying to reject.”

Sometimes the data is big and sometimes it is small. Sometimes we need statistics and other times we need just a few occurrences to infer something. Most of the time we can explain it and some times we just conjecture about it. I must admit that working with data can be hard due to volume, variety and veracity of data. It can test the limits of one’s competence as well as patience. The patterns in the data or the patterns of data engineering are never easy to identify and hence the approach to solve each problem has to be defined as we go. Data Science and Engineering may very well be witchcraft; all it needs is wizards and witches like us to master it. With strong analytical skills, good hold on foundational mathematics, some engineering background and stamina to keep up with the exponential learning curve you can be a data scientist/engineer. Most of the problems we encounter are pretty unique and solving data-driven problems is very exciting because you can confirm or reject your ideas pretty quickly, making it a very rewarding experience as an engineer. We receive directions and guidance from our Product Management and Merchandising teams on how business is shaping and what objectives are critical in the coming quarter or year and we start on defining the problems and breaking them down into tasks. This is pretty much similar to general software engineering but the key difference is the nature of the problems we solve, the scale at which the solution must work, the number of end-users that get impacted by our work and most importantly measuring and making sense of that impact. In my first year at Target working as Principal Data Engineer, there have been no dull-moments except the ones when I had to catch up on my sleep. So as I draw this first installment of DSE Update to a close, I want to leave you with a thought.

“All Science is Data Science. Because without data, there can be no science. And all of us are Data Scientists, while some of us pursue it as a career.”

About the Author

Product Recommendations (Personalization Engine) Lead and Principal Data Engineer at Target, delivering value-driven service at scale.