March 2000
Most statistical software packages are completely dependent upon the user regarding the selection and implementation of the correct model to analyze. Whatever model the user defines, the computer will "fit" that model. Incorrect conclusions can be drawn when inappropriate models are fit to the data. The difficulty is that the computer program cannot inform you when the model is inappropriate.
Some common errors in model specification include the use of interactions without replication, the use of interactions with incomplete or missing data, and the misuse of nested models. Fortunately, many software packages give you hints that you have misspecified the model and many errors can be avoided by properly interpreting these hints.
For instance, a model is fit and the output provides no estimate of the Mean Square Error (MSE) and no ftests or pvalues are given. The model as specified was correctly fit to the data, so no error messages are seen by the user. However, the lack of ftests is itself an error message indicating that all the degrees of freedom (df) have been exhausted leaving nothing for the MSE. This commonly happens when interactions are fit without replication in the data. Suppose 50 consumers each saw 4 products and the following model was fit: 
Source of Variation
Panelist
Product
Panelist*Product 
df
49
3
147 
If you total up the degrees of freedom you see that there are no df left for the error term! Remove the interaction from the model and the problem is solved. (In some cases, if panelist is properly specified as a random effect the test for product differences will still be performed in the above model. Thus, if no tests are performed also check whether panelist is a fixed or random effect.)
A second instance is the improper use of a nested model. Suppose data from two cities are collected (50 panelists per city and each panelist sees the same 3 products in each city) and the following model is fit: 
Source of Variation
Panelist
Product City
Product*City
Error 
df
99
2
1
2
195 
However, this model may result in the Product or City term having 0 degrees of freedom in the output given by the computer program. Again, an "error message." While Product and City are crossed effects and the interaction is correct, panelists are nested within city (different panelists in different cities). Thus, the correct model is: 
Source of Variation
City
Panelist within City
Product
Product*City
Error 
df
1
98
2
2
196 
Consequently, recheck the appropriateness of your model when faced with nonsensical analysis results from your computer package.
To learn more about data analysis interpretation
please call us at +1 541.757.1404 or email info@camo.com.
