summaryrefslogtreecommitdiff
path: root/transcripts/startup-science-2012/jeremy-howard-margrit-zwimmer.mdwn
blob: 89d130e4324f81bd1fe7d8287da4caa9a0e795f0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
Jeremy Howard Margrit Zwimmer

Who would consider themselves an entreprenur, aspires to soon have a startup,
just raise your hand? And then science, who would consider themselves a
scientist, or they have a phd, they would be in academia, so cool. And Jeremy,
what's your background, scientist, entreprenur or both? Data scientist.. so my
background is in philosophy.. I spent 8 years in management consulting at
Mackenzie doing business strategy at legal companies. I am nearly fully
recovered. This is my third startup. My first two startups were self-unded..
one was about actuarial algorithms and the other was fastmail. Now I have a
venture funded startup called Kaggle that runs data challenging competitions.

Unlike the other two companies, Kaggle is not yet building products or selling
them at a product. Yet somehow we are a venture funded company, we get a lot
more press. Kaggle gets a lot of publicity and there's something about getting
VC in the Bay Area.. you know, people congratulate you. I have two startups
before, Kaggle hopes to be successful.. but I hope you have to do it at a
different scale at this way. Self-funded, you just want to get to this point
where you're making enough money to pay your staff, or yourself, or hopefully
you can get to a point where you can hand it off to someone else, where it's
enough money that makes you feel comfortable.

For a VC-backed startup, you have to launch it into outerspace because
investors have the right to decide whether or not your company xyz, and they're
not interested in anything other than a hugely exciting exit. So that means
Kaggle is either huge or it will die a painful death. It's about acceleration,
leverage, whereas my previous startups were about making sure we were
profitable at each stage, and so on. The important thing to remember is that
for startup founders there are only two ways to fail: you give up. Way number
two is that you run out of money. Don't do either of those. You should be in
control of your cash flow, and by definition you are succeeding.

Let's talk about what Kaggle does for people who don't know. How would I,
what's the audience, and who are the users? Kaggle is a very interesting type
of business, it shares it with some of these businesses, it's a platform
mediated flower sketchy sketch. Kaggle solves datamining problems for NASA,
Allstate, Wikimedia, we've run over 100. We solve these data mining problems in
much the same way we answer the question about how to find interesting things
to say.. I outsourced it to you. We outsource it to 20,000 data scientists. So
the one thing you might wonder is how would you find thsoe 20k people to work
together? Well you don't, you get them to compete against each other to be
better than each other.

Predictive modeling is interesting because you can take someone's answer and
create a score in real time about how good it is. So what about creating credit
scores? Which people are going to default on their loans and which aren't? The
bank can check your answes, and see that you got 70% of them right, and jeremy
got 65% right, so you win.. you can do golf tournaments or tennis. Kaggle is
the first true meritocracy outside the world of sports, whoever makes the most
predictive models.. it's $300 on the low end and $3M on the high end.

Not only are you able to introduce meritocracy, but you can open up the playing
field to a much larger audience, seems to be a consistent theme.. or you can
show the examples? Our very first proper competition was 2 years ago was for a
research group who was looking at the progression of HIV in patients and trying
to predict it using genomic markers. The error rate from academia sliced by
30%. Second place was the IBM Watson research center, it was the same people
who created the jeopardy people. First place was an English dropout who taught
himself machine learning. This was published in Science Magazine, created by
somebody that other people wouldn't have normally recognized this person. This
has happened again and again, hundreds of competitions have resulted in
breakthroughs beyond previous research.

Are there any downsides to this approach? The key downside is that it requires
organization to open their data to the internet. It erquires or we encourage
the IP handling to be done in a fairly public way. The state of the art in
machine learning can be improved this way. What we've done with this is that we
have a private competition, most commercial orgs use this, where we invite 10
or 15 of our successful participates, we look at the orthagonality of their
previous solutions and we invite them to compete and NDA in secret. That way,
companies can leverage the power of the competition, which really drives
synergies beyond these avoiding these data privacy issues.