Jeremy Howard Margrit Zwimmer
Who would consider themselves an entreprenur, aspires to soon have a startup, just raise your hand? And then science, who would consider themselves a scientist, or they have a phd, they would be in academia, so cool. And Jeremy, what's your background, scientist, entreprenur or both? Data scientist.. so my background is in philosophy.. I spent 8 years in management consulting at Mackenzie doing business strategy at legal companies. I am nearly fully recovered. This is my third startup. My first two startups were self-unded.. one was about actuarial algorithms and the other was fastmail. Now I have a venture funded startup called Kaggle that runs data challenging competitions.
Unlike the other two companies, Kaggle is not yet building products or selling them at a product. Yet somehow we are a venture funded company, we get a lot more press. Kaggle gets a lot of publicity and there's something about getting VC in the Bay Area.. you know, people congratulate you. I have two startups before, Kaggle hopes to be successful.. but I hope you have to do it at a different scale at this way. Self-funded, you just want to get to this point where you're making enough money to pay your staff, or yourself, or hopefully you can get to a point where you can hand it off to someone else, where it's enough money that makes you feel comfortable.
For a VC-backed startup, you have to launch it into outerspace because investors have the right to decide whether or not your company xyz, and they're not interested in anything other than a hugely exciting exit. So that means Kaggle is either huge or it will die a painful death. It's about acceleration, leverage, whereas my previous startups were about making sure we were profitable at each stage, and so on. The important thing to remember is that for startup founders there are only two ways to fail: you give up. Way number two is that you run out of money. Don't do either of those. You should be in control of your cash flow, and by definition you are succeeding.
Let's talk about what Kaggle does for people who don't know. How would I, what's the audience, and who are the users? Kaggle is a very interesting type of business, it shares it with some of these businesses, it's a platform mediated flower sketchy sketch. Kaggle solves datamining problems for NASA, Allstate, Wikimedia, we've run over 100. We solve these data mining problems in much the same way we answer the question about how to find interesting things to say.. I outsourced it to you. We outsource it to 20,000 data scientists. So the one thing you might wonder is how would you find thsoe 20k people to work together? Well you don't, you get them to compete against each other to be better than each other.
Predictive modeling is interesting because you can take someone's answer and create a score in real time about how good it is. So what about creating credit scores? Which people are going to default on their loans and which aren't? The bank can check your answes, and see that you got 70% of them right, and jeremy got 65% right, so you win.. you can do golf tournaments or tennis. Kaggle is the first true meritocracy outside the world of sports, whoever makes the most predictive models.. it's $300 on the low end and $3M on the high end.
Not only are you able to introduce meritocracy, but you can open up the playing field to a much larger audience, seems to be a consistent theme.. or you can show the examples? Our very first proper competition was 2 years ago was for a research group who was looking at the progression of HIV in patients and trying to predict it using genomic markers. The error rate from academia sliced by 30%. Second place was the IBM Watson research center, it was the same people who created the jeopardy people. First place was an English dropout who taught himself machine learning. This was published in Science Magazine, created by somebody that other people wouldn't have normally recognized this person. This has happened again and again, hundreds of competitions have resulted in breakthroughs beyond previous research.
Are there any downsides to this approach? The key downside is that it requires organization to open their data to the internet. It erquires or we encourage the IP handling to be done in a fairly public way. The state of the art in machine learning can be improved this way. What we've done with this is that we have a private competition, most commercial orgs use this, where we invite 10 or 15 of our successful participates, we look at the orthagonality of their previous solutions and we invite them to compete and NDA in secret. That way, companies can leverage the power of the competition, which really drives synergies beyond these avoiding these data privacy issues.