This is related to an open research question, which is known as the "Online Boolean Matrix-Vector Multiplication (OMv) problem". This problem reads as follows (see [1]): Given a binary n×n matrix M and n binary column vectors v1,…,vn, we need to compute Mvi before vi+1 arrives.
Notice that the problem from the question is somewhat more general: It allows for m×n matrices and real-valued vectors. Observe that the problem with n×n matrices and Boolean vectors is "easier", as it presents a special case.
Clearly, the naïve algorithm for the Online Boolean Matrix-Vector Multiplication problem (which just uses standard matrix-vector-multipliction) takes time O(n3). There is a conjecture (see e.g. [1]) that this cannot be done truly faster than O(n3). (In more detail, this conjecture goes as follows: There exists no truly subcubic algorithm, which solves the Online Boolean Matrix-Vector Multiplication Problem, i.e. there is no algorithm with running time O(n3−ε) for ε>0).
It is known that Williams's algorithm solves this problem in time O(n3/log2n). See [2] for more details.
It would be a breakthrough in the area of conditional lower bounds, if one could prove or disprove the above conjecture.
[1] Unifying and Strengthening Hardness for Dynamic Problems via an Online Matrix-Vector Multiplication Conjecture. by Henzinger, Krinninger, Nanongkai and Saranurak
[ http://eprints.cs.univie.ac.at/4351/1/OMv_conjecture.pdf ]
[2] Matrix-vector multiplication in sub-quadratic time: (some preprocessing required). by Williams
[ http://dl.acm.org/citation.cfm?id=1283383.1283490 ]
Update
One of the questions in the comments was as follows: We know M at compile time. Can't we adjust our algorithm to suit M, so the OMv problem (conjecture) does not apply? We will see that this is not the case, unless the OMv conjecture fails.
The proof idea is simple: Assume we could give fast algorithms for all matrices up to some certain size (e.g. distinguishing all possible cases). After this certain size we use divide and conquer.
Here are the details:
Fix some n0∈N, which (without loss of generality) is a power of 2 and bigger than 2. Now assume that for all n≤n0 and all n×n matrices M we know an algorithm An,M, that for all vectors v computes Mv in truly subquadratic time, i.e. in time O(n2−ε) for some ε>0. (Notice that this allows an individual algorithm for each matrix up to size n0×n0.)
Now we will solve OMv in truly subcubic time:
Given a binary matrix M of size n×n, where n=2k for some k and n>n0, we use a divide and conquer strategy. We divide M into four submatrices M1,M2,M3,M4 of sizes 2k−1×2k−1. If 2k−1≤n0, then we use algorithm A2k−1,Mi, otherwise, we recurse. (As n0 is some fixed number, we can pick the correct algorithm in constant time.)
Notice that we will need at most O(logn) recursion steps. Also, for n vectors v1,…,vn, we will n computations. Thus, to process all matrix-vector multiplications we will need a total computation time of O(n3−εlogn).
It is well known that the logarithm grows slower than any polynomial (in particular slower than any root). Fixing some ε~>0 with ε~<ε, we see that our total computation is running in truly subcubic time (in particular, in time O(n3−ε~)). Thus, the OMv conjecture would be wrong.
(If M has size m×n and m and n are not powers of 2, then the bounds on the running times still apply, as we could just increase n and m to the next powers of 2.)
Conclusion: If you could make use of case distinctions on the input matrices to derive fast algorithms, then you could improve the OMv conjecture.