Methods specified for variable selection using statistic such as P, stepwise regression in the classic text Hosmer et al should at all cost be avoided.
Recently I stumbled upon an article that was published in the international journal of forecasting entitle "Illusions of predictability" and a commentory on this article by Keith ord. I would highly recommend both these article as they clearly show that using regression statistic is often misleading. Follwoing is a screenshot of Keith Ord's article that shows by simulation why step wise regression (uses p statistic) for variable selection is bad.
Another wonderful article by Scott Armstrong that appeared in the same issue of the journal shows why one should be very cautious on using regression analysis on non-experimental data with case studies. Ever since I read these articles I avoid using regression analysis to draw causal inferences on non-experimental data. As a practitioner, I wish I had read articles like this many years which would have saved me from making bad decisions and avoiding costly mistakes.
On your specific problem, I don't think randomized experiments are possible in your case, so I would recommend that you use cross validation to select variables. A nice worked out example is available in this free online book on how you would use predictive accuracy to select variables. It also many othervariable selction methods, but I woud restrict to cross validation.
I personally like the quote from Armstrong "Somewhere I encountered the idea that statistics was supposed to aid communication. Complex regression methods and a flock of diagnostic statistics have taken us in the other direction"
Below is my own opinion. I'm not a statistician.
As a biologist I think you would appreciate this point. Nature is very complex, assuming logistic function and no interaction among variables does not occur in nature. In addition, logistic regression has following assumptions:
The true conditional probabilities are a logistic function of the
independent variables.
No important variables are omitted. No extraneous variables are included.
- The independent variables are measured without error.
- The observations are independent.
- The independent variables are not linear combinations of each other.
I would recommend classification and regression tree (CART(r)) as an alternative over logistic regression for this type of analysis because it is assumptions free:
- Non parametric/Data Driven/No assumptions that your output probablities follow logistic function.
- Non linear
- allows complex variable interaction.
- Provides highly interpretable visual trees that a non statistician like forest managers would appreciate.
- Easily handles missing values.
- Dont need to be a statistician to use CART!!
- automatically selects variables using cross validation.
CART is a trademark of Salford Systems. See this video for introduction and history of CART. There are also other videos such as cart - logistic regrssion hybrids in the same website. I would check it out. an open source impentation in R is called Tree, and there are many other packages such as rattle available in R. If I find time, I will post the first example in Homser's text using CART. If you insist on using logistic regression, then I would at least use methods like CART to select variables and then apply logistic regression.
I personally prefer CART over logistic regression because of aforementioned advantages. But still, I would try both logistic regression and CART or CART-Logistc Regression Hybrid, and see which gives better predictive accuracy and also more importantly better interpretatablity and choose the one that you feel would "communicate" the data more clearly.
Also, FYI CART was rejected by major statistical journals and finally the inventors of CART came out with a monograph. CART paved way to modern and highly successful machine learning algorithms like Random Forest(r), Gradient Boosting Machines (GBM), Multivariate Adaptive Regression Splines all were born. Randomforest and GBM are more accurate than CART but less interprettable (black box like) than CART.
Hopefully this is helpful. Let me know if you find this post useful ?