The entropy tells you how much uncertainty is in the system. Let's say you're looking for a cat, and you know that it's somewhere between your house and the neighbors, which is 1 mile away. Your kids tell you that the probability of a cat being on the distance x from your house is described best by beta distribution f(x;2,2). So a cat could be anywhere between 0 and 1, but more likely to be in the middle, i.e. xmax=1/2.
Let's plug the beta distribution into your equation, then you get H=−0.125.
Next, you ask your wife and she tells you that the best distribution to describe her knowledge of your cat is the uniform distribution. If you plug it to your entropy equation, you get H=0.
Both uniform and beta distributions let the cat be anywhere between 0 and 1 miles from your house, but there's more uncertainty in the uniform, because your wife has really no clue where the cat is hiding, while kids have some idea, they think it's more likely to be somewhere in the middle. That's why Beta's entropy is lower than Uniform's.
You might try other distributions, maybe your neighbor tells you the cat likes to be near either of the houses, so his beta distribution is with α=β=1/2. Its H must be lower than that of uniform again, because you get some idea about where to look for a cat. Guess whether your neighbor's information entropy is higher or lower than your kids'? I'd bet on kids any day on these matters.
UPDATE:
How does this work? One way to think of this is to start with a uniform distribution. If you agree that it's the one with the most uncertainty, then think of disturbing it. Let's look at the discrete case for simplicity. Take Δp from one point and add it to another like follows:
p′i=p−Δp
p′j=p+Δp
Now, let's see how the entropy changes:
H−H′=pilnpi−piln(pi−Δp)+pjlnpj−pjln(pj+Δp)
=plnp−pln[p(1−Δp/p)]+plnp−pln[p(1+Δp/p)]
=−ln(1−Δp/p)−ln(1+Δp/p)>0
This means that any disturbance from the uniform distribution reduces the entropy (uncertainty). To show the same in continuous case, I'd have to use calculus of variations or something along this line, but you'll get the same kind of result, in principle.
UPDATE 2:
The mean of n uniform random variables is a random variable itself, and it's from Bates distribution. From CLT we know that this new random variable's variance shrinks as n→∞. So, uncertainty of its location must reduce with increase in n: we're more and more certain that a cat's in the middle. My next plot and MATLAB code shows how the entropy decreases from 0 for n=1 (uniform distribution) to n=13. I'm using distributions31 library here.
x = 0:0.01:1;
for k=1:5
i = 1 + (k-1)*3;
idx(k) = i;
f = @(x)bates_pdf(x,i);
funb=@(x)f(x).*log(f(x));
fun = @(x)arrayfun(funb,x);
h(k) = -integral(fun,0,1);
subplot(1,5+1,k)
plot(x,arrayfun(f,x))
title(['Bates(x,' num2str(i) ')'])
ylim([0 6])
end
subplot(1,5+1,5+1)
plot(idx,h)
title 'Entropy'