Results

At first sight, the results provided by the unpaired Wilcoxon tests were not advocating aesthetics metrics with 17 matching out of 52 items - an item represents the value and the score given by user for one metric for one specific UI (i.e. 13 metrics * 4 UIs = 52 items) - or in other words with only 1 metric out of 12 (proportion) significantly represented users reviews for each UI - which is not a surprise per se since it is not really possible to obtain exactitude when comparing formula's values and Likert scale values. However, the results of the paired Wilcoxon tests were much more interesting because they were providing an idea of how the UI were ranked for a specific metric. Indeed, we compare each median of the reviews score for one UI with one other UI. If the p-value is inferior to 10%, the null hypothesis according to which the score are equal is rejected. It is even possible to determine the superior UI taking into account the positive or negative sign of the statistic outcome. That leads to a possible comparison between UI rankings for each metric by formulas on one hand and by users on the other hand. The result is then a number of 4 metrics out of 12 for which UI were ranked mainly similarly by both criteria showing that metrics formulas were representative of human eye. Those metrics were balance, equilibrium, density, and economy.


Table: Ranking of UIs exclusively considering balance according to the users and the metric
\begin{table}
\footnotesize
\centering
\begin{tabularx}{\linewidth}{\vert>{...
...ex]
\cline{2-3}
&ATM &ATM&\\ [1.5ex]
\hline
\end{tabularx}
\end{table}


Zen 2014-05-07