CAMO Software
Home > Page

Interpreting Statistical Software "Error" Messages"

 

March 2000

Most statistical software packages are completely dependent upon the user regarding the selection and implementation of the correct model to analyze.  Whatever model the user defines, the computer will "fit" that model.  Incorrect conclusions can be drawn when inappropriate models are fit to the data.  The difficulty is that the computer program cannot inform you when the model is inappropriate.

Some common errors in model specification include the use of interactions without replication, the use of interactions with incomplete or missing data, and the misuse of nested models.  Fortunately, many software packages give you hints that you have misspecified the model and many errors can be avoided by properly interpreting these hints.
 
 

For instance, a model is fit and the output provides no estimate of the Mean Square Error (MSE) and no f-tests or p-values are given.  The model as specified was correctly fit to the data, so no error messages are seen by the user.  However, the lack of f-tests is itself an error message indicating that all the degrees of freedom (df)  have been exhausted leaving nothing for the MSE.  This commonly happens when interactions are fit without replication in the data.  Suppose 50 consumers each saw 4 products and the following model was fit:  Source of Variation
 Panelist
 Product
 Panelist*Product
 df
 49
  3
147

If you total up the degrees of freedom you see that there are no df left for the error term! Remove the interaction from the model and the problem is solved.  (In some cases, if panelist is properly specified as a random effect the test for product differences will still be performed in the above model.  Thus, if no tests are performed also check whether panelist is a fixed or random effect.)
 
 

A second instance is the improper use of a nested model.  Suppose data from two cities are collected (50 panelists per city and each panelist sees the same 3 products in each city) and the following model is fit:  Source of Variation
 Panelist
 Product City
 Product*City
 Error

 df
 99
  2
  1
  2
195

However, this model may result in the Product or City term having 0 degrees of freedom in the output given by the computer program.  Again, an "error message."  While Product and City are crossed effects and the interaction is correct, panelists are nested within city (different panelists in different cities).  Thus, the correct model is:  Source of Variation
 City
 Panelist within City
 Product
 Product*City
 Error
 df
  1
 98
  2
  2
196

Consequently, recheck the appropriateness of your model when faced with non-sensical analysis results from your computer package.

To learn more about data analysis interpretation
please call us at +1 541.757.1404 or email info@camo.com.