The answer above using stochastic equicontinuity works very well, but here I am answering my own question by using a uniform law of large numbers to show that the observed information matrix is a strongly consistent estimator of the information matrix
, i.e. N−1JN(θ^N(Y))⟶a.s.I(θ0) if we plug-in a strongly consistent sequence of estimators. I hope it is correct in all details.
We will use IN={1,2,...,N} to be an index set, and let us temporarily adopt the notation J(Y~,θ):=J(θ) in order to be explicit about the dependence of J(θ) on the random vector Y~. We shall also work elementwise with (J(Y~,θ))rs and (JN(θ))rs=∑Ni=1(J(Yi,θ))rs, r,s=1,...,k, for this discussion. The function (J(⋅,θ))rs is real-valued on the set Rn×Θ∘, and we will suppose that it is Lebesgue measurable for every θ∈Θ∘. A uniform (strong) law of large numbers defines a set of conditions under which
supθ∈Θ∣∣N−1(JN(θ))rs−Eθ[(J(Y1,θ))rs]∣∣=supθ∈Θ∣∣N−1∑Ni=1(J(Yi,θ))rs−(I(θ))rs∣∣⟶a.s0(1)
The conditions that must be satisfied in order that (1) holds are (a) Θ∘ is a compact set; (b) (J(Y~,θ))rs is a continuous function on Θ∘ with probability 1; (c) for each θ∈Θ∘ (J(Y~,θ))rs is dominated by a function h(Y~), i.e. |(J(Y~,θ))rs|<h(Y~); and
(d) for each θ∈Θ∘ Eθ[h(Y~)]<∞;. These conditions come from Jennrich (1969, Theorem 2).
Now for any yi∈Rn, i∈IN and θ′∈S⊆Θ∘, the following inequality obviously holds
∣∣N−1∑Ni=1(J(yi,θ′))rs−(I(θ′))rs∣∣≤supθ∈S∣∣N−1∑Ni=1(J(yi,θ))rs−(I(θ))rs∣∣.(2)
Suppose that {θ^N(Y)} is a strongly consistent sequence of estimators for θ0, and let ΘN1=BδN1(θ0)⊆K⊆Θ∘ be an open ball in Rk with radius δN1→0 as N1→∞, and suppose K is compact. Then since θ^N(Y)∈ΘN1 for N sufficiently large enough we have P[limN{θ^N(Y)∈ΘN1}]=1 for sufficiently large N. Together with (2) this implies
P[limN→∞{∣∣N−1∑Ni=1(J(Yi,θ^N(Y)))rs−(I(θ^N(Y)))rs∣∣≤supθ∈ΘN1∣∣N−1∑Ni=1(J(Yi,θ))rs−(I(θ))rs∣∣}]=1.(3)
Now ΘN1⊆Θ∘ implies conditions (a)-(d) of Jennrich (1969, Theorem 2) apply to ΘN1. Thus (1) and (3) imply
P[limN→∞{∣∣N−1∑Ni=1(J(Yi,θ^N(Y)))rs−(I(θ^N(Y)))rs∣∣=0}]=1.(4)
Since (I(θ^N(Y)))rs⟶a.s.I(θ0) then (4) implies that N−1(JN(θ^N(Y)))rs⟶a.s.(I(θ0))rs. Note that (3) holds however small ΘN1 is, and so the result in (4) is independent of the choice of N1 other than N1 must be chosen such that
ΘN1⊆Θ∘. This result holds for all r,s=1,...,k, and so in terms of matrices we have N−1JN(θ^N(Y))⟶a.s.I(θ0).