Lean (Machine) Learning
Thomas Dickerson is a native Vermonter, and 4th year PhD student in the Dept. of Computer Science at Brown University, with a research focus on applying computer science theory to interdisciplinary problems. He and Christopher Mitchell (a post-doctoral researcher at NYU) are the co-founder of geopipe, a startup that leverages cloud computing to turn Big Data into expansive virtual models of the real world.. They met through an online community dedicated to their shared love for programming TI graphing calculators.
Thomas previously co-founded the Vermont Sustainable Heating Initiative, an award-winning 501(c)3 non-profit focused on local energy policy in the home-heating sector, and still serves on their board of directors. Thomas shares his journey from academia to entrepreneurship – applying the algorithmic lens to startup methodology. Here are his key takeaways:
Matters of Perspective
As researchers with a primarily academic background, the NYU Summer Launchpad program has been a particularly novel learning experience for the geopipe team. We faced a number of cultural differences between life as computer scientists and life as entrepreneurs.
For example, we were told early in the process that our greatest weakness as entrepreneurs would be that we, as researchers, would get bored unless we were totally in love with what we were working on. Whereas academic papers and collaborations are typically short-term commitments for a couple of months, startups are years in the making.
We also had to overcome a natural tendency to understate our confidence in our hypotheses and results while presenting. I recently attended the “Symposium on Logic in Computer Science” and listened to a talk in which the researchers announced results for a certain class of problems when the input value was ≤ 7 or when it was ≥ 9. An audience member asked about 8 and the response was “we had a proof sketch by the submission deadline, but the details were still fuzzy, so it didn’t feel right to include it in the presentation”. From an entrepreneurial perspective, this anecdote is humorous, bordering on absurd, but most mathematicians or theoretical computer scientists would see this as having been exactly the correct thing to do. Similarly, particle physicists are loath to announce results unless their statistical analysis puts the odds of being incorrect at smaller than 1 in 3.5 million.
But there are also similarities. Early in the summer, Frank said something along the lines of “don’t bother drawing up 5 year a business plan. It’s a ton of work, and you’ll just end up throwing it in the trash when it’s proven wrong in 6 months”, and I immediately chuckled and whispered to Christopher “the gap between theory and practice is smaller in theory than in practice.” Of course, in the world of economic decision making specifically, computer scientists have strong proofs of why this is the case: finding profit-maximizing market equilibria is known to be computationally intractable, as is even finding approximate equilibria. And, there are other places where computational tools provide a useful lens to understanding problems that startups face, and I’ll discuss two of the ones that particularly struck me over the course of the summer.
Making Decisions Without Statistical Significance
One of the mantras this summer has been “don’t worry about statistical significance, you’ll start seeing trends before that”. For many in the sciences, this sort of advice is incredibly painful. You’re likely to hear “how can we properly reject hypotheses without a statistically significant signal?” from a statistician or a physicist, and “humans are incredibly bad at interpreting probabilities and incredibly good at tricking ourselves into seeing imaginary signals in a field of noise” from a psychologist.
However, it turns out that computer science has an entire subfield dedicated to solving exactly this sort of problem, and how to correctly approach the exploration/exploitation tradeoff in a particular problem space. Though traditional machine learning techniques (e.g. SVMs) are essentially extensions of traditional statistical methods, and deal with the analysis of large datasets, which must be available up front; the field of “reinforcement learning” deals with rigorous learning in an online context (that is, where you must make decisions and act on each new data point as it arrives).
The basic framework for analyzing such problems is referred to as a “Markov Decision Problem”, in which an agent is exploring a network of states, each of which provides a set of possible actions which will probabilistically transition you to another state, and simultaneously either reward or punish you. The natural question to ask about such problems is what policy mapping states to actions will have the best outcome of a suitably defined “long run”. When the whole network is known, it is possible to solve such problems exactly, in reasonably efficient amounts of time. Of course, in the real world, it is commonly the case that we do not exactly know what probabilities or rewards are associated with our actions, in which case we are in the Reinforcement Learning context. The tools which have been developed to solve such problems are incredibly powerful, and the results have recently been in the news.
Handling Conflicting Advice
Another problem we’ve faced over the course of the summer is figuring out how to handle conflicting advice from our various mentors and members of the teaching team, in instances where their opinions have differed from one another (and/or from our own). Later, while discussing how we might go about forming an advisory board, one mentor remarked, slightly tongue-in-cheek, that “it’s always easier to fire an advisor than an employee: you can keep them around and just stop listening to them”.
It turns out that this meta-advice shares a lot of similarities with a key algorithmic insight for online algorithms, known as the “multiplicative weights update” rule. For a simple example of how this rule can be applied, let us suppose that we are deciding whether to invest in a particular stock, that will either go up or down each day, and that every morning we must predict what will happen that day (and predicting incorrectly will cost us a fixed value - say $1); and that additionally we are permitted to consult an arbitrary panel of “experts” (though we have no way of knowing whether they are truly experts). We begin by weighting their advice equally (since we don’t yet know what to expect from them), and each morning take the weighted majority of their advice. However, at the end of each day, we recalculate the weights we assign to each expert by penalizing each one who predicted incorrectly that day by cutting their relative influence in half. Then, over time we can prove that we will do nearly as well the best expert, despite having no prior knowledge of the behavior of the market, or of the quality of our experts.
While this is a relatively simple example, it turns out that this same principle can be generalized to approximate a wide variety of hard decision-making problems.
While the world of startups and business development is far removed from the world of academia, many of the hard decisions we have to make are still inherently computational in nature: we can quantify our knowledge and uncertainty, and we can quantify the possible risks and rewards of each decision, and apply algorithmic tools to help us choose well. Of course many of the problems we face in the business world are of enormous complexity and resist attempts to solve them directly, but even here, the algorithmic lens allows us to classify our problems and contemplate how we might solve them with enough information, and this sort of insight is valuable in itself.