I like the answers already given, but let me complement them with a different (and more tongue-in-cheek) approach.
Suppose we collect a bunch of observation from 1000 random people trying to find out if punches in the face are associated with headaches:
Headaches=β0+β1Punch_in_the_face+ε
ε contains all the omitted variables that produce headaches in the general population: stress, how contaminated your city is, lack of sleep, coffee consumption, etc.
For this regression, the β1 might be very significant and very big, but the R2 will be low. Why? For the vast majority of the population, headaches won't be explained much by punches in the face. In other words, most of the variation in the data (i.e. whether people have few or a lot of headaches) will be left unexplained if you only include punches in the face, but punches in the face are VERY important for headaches.
Graphically, this probably looks like a steep slope but with a very big variation around this slope.