Principles of Signal Detection Theory and Application to Interface Design
The primary goal of a user interface is to present information in a clear manner. With a good interface, users will be able to recognize features, understand what the interface is helping them accomplish, be able to predict outcomes and of course successfully complete their task. Extrapolating on this principle, using multiple screen sequences to break up steps can help reduce confusion by only presenting a user with a small amount of information at any given time. Doing so will give the interaction a more streamlined feel, limiting the interference of irrelevant information. In that way, interfaces help clarify, highlight, provide access and enable users. Naturally, some interfaces are better than others, and the method by which information is displayed can have a large impact on the overall user experience.
The decision-making process can happen under a number of circumstances, oftentimes having a level of uncertainty. Signal Detection Theory provides a more complete understanding of decision making in the presence of uncertainty by presenting a model for evaluating user sensitivity and bias in regards to detecting a signal within noise. In fact, the theory is applicable to any situation – not necessarily a user interface – in which a signal and noise are difficult to discriminate. In yes or no decisions, the user can either indicate that yes there was a signal or no there was not. This leads to four possible outcomes: Hit, False Rejection (Miss), False Hit (False Alarm), and Correct Rejection. Using hit and false alarm rate to analyze user performances, data on individual user sensitivity (d’) and criterion (beta) can be obtained. When evaluating comparable interfaces, the principles of signal detection theory can underscore strengths and shortcomings. Applied to real world situations, there are potential consequences to giving a false alarm or missing a signal, which in effect has an influence on a user’s criterion driven by a desire to avoid the consequence. Users must make an internal decision on how liberal or conservative they will be an whether that is appropriate for their given task (Wickens, 2002).
An experiment designed to highlight the principles of signal detection theory had participants interact with two user interfaces, one with 10 elements of information per screen (V10) and one with 3 elements per screen (V3). For each version, the participant was briefly shown a total of 1800 screens, and was instructed to indicate whether they detected a signal (Y) within the noise (V). Each screen would either present only noise (all V units), or one signal unit (Y) surrounded by noise (V). Data on response choice was collected for the entire group and was analyzed to present a more comprehensive understanding of the user’s interaction with this given interface. Sensitivity (d’) and criterion (beta) were calculated and Z-scores were graphed for each participant to give a visual representation of each interface.
Detecting the signal was not always easy for the user. In some cases it was very clear o the user that a signal was displayed, perhaps looking in a specific region by pure chance. In other cases the user could be completely unsure whether they saw a signal in the noise, in which case the participant had to use a method of internal evaluation to decide whether they saw a V or Y. Looking for subtle changes in the letter shape or systematically scanning the page from one side to the other, attempting to take in every information unit, each user had to adopt a process for how to determine if there was a signal or just noise.
Internal noise, or rather naturally occurring noise in one’s nervous system is normally distributed. This noise can sometimes cause people to believe that they were exposed to a signal when in fact they were not. It is important to note that noise does not only come from the interface in this case but also from the users environment and internal neural activity. The level of internal noise varies from individual to individual and is represented by the fact that some users indicated a hit when there were no signals on the screen.
The decision process is as follows: when the internal response, determined by the individual’s sensitivity (d’), is greater than a predetermined criterion, the participant responds “yes”. Whenever the internal response is less than this criterion they respond “no”. The user can adjust their criterion based on the application. If the consequence of missing a hit is greater than the consequence of reporting a false alarm then the user can adjust their criterion to be more liberal. If in fact the opposite is true (false alarm consequence is greater), the user can adjust their criterion to be more conservative and thus minimize the chances of reporting a false alarm while also accepting a decrease in “hit” responses (Wickens, 2002).
When analyzed, the results of this study reveal that all participants had a z-score value of less than 0, which is a result of where participants set their criterion (BETA) and is consistent with a more conservative bias. Fatigue, boredom and lack of motivation would ultimately influence a participant’s BETA value, some of which was likely by design of the study (length of time, 30 minutes). Since it was difficult to detect a signal, a conservative criterion is understandable.
The aggregate data from the 10-element interface yielded a mean hit rate of 27.78±16.24% and a false alarm rate of 4.08±5.91%. The standard deviation is especially large for both rates, suggesting that users did not interact with the interface in the same manner. Differences in participant criterion are the root cause of this phenomenon. This also verifies that the signal for the V10 interface was difficult to identify, translating to a large discrepancy between liberal and conservative responses. The BETA values also support this finding. V10 BETA average was 19.68±28.07 with a minimum value of 1.46 and maximum of 87.59. A low BETA value indicates that a user had both a lot of hits and false alarms (liberal) whereas a high BETA corresponds with few hits and fewer false alarms (conservative).The 3-element interface produced a much different result, as expected. The mean hit rate was 60.45±15.66% and a mean false alarm rate of 2.56±2.73%. Compared to the V10 interface, the hit rate for this interface was significantly higher and the false alarm rate was slightly lower. Similar to the V10 interface however, the variation in response criterion gives the hit and false alarm rates large standard deviations. Average BETA value for the V3 interface was 17.19±20.48 with a minimum of 0.98 and maximum 69.7. Comparing these BETA values with the BETA from the V10 interface, one will note the increase in performance. This is intuitive, seeing that the signal was not masked by as much noise as the V10 version.
Based on the instructions, some participants were supposed to either skew their responses in a conservative or liberal way; however it does not appear that people fully read or understood the study instructions. It is also possible that participants were not able complete the task maintaining a certain level of attention. The test was administered in person and though participants were asked to read through the instructions, verbal instruction was likely listened to more completely, which did not mention the designed variation in participant criterion. This had the effect of people trying to complete the study to the best of their ability, limiting the number of false alarms and getting as many hits as possible. This is more easily understood by analyzing the graph of participant z-scores where false alarm rate was graphed on the x-axis and hit rate on the y-axis.
The z-scores for each participant were calculated using the hit and false alarm rates and were plotted to give a visual representation of the data. Graphing the z-scores is an effective way of analyzing performance since it takes into account the relationship between participant hit and false alarm rate. Average z-score for hits and false alarms in the V10 interface were found to be -0.65 and -2.12, respectively. A graphical representation of each participant performance for the V10 interface is presented in Figure 1. The average z-score for hits and false alarms in the V3 version were 0.30 and -2.14, respectively. A graphical representation of each participant performance for the V3 interface is presented in Figure 2. The trend line for both versions shows the normal distribution of performance. On the Aggregate V10 graph (Figure 1) the dashed line represents the separation between liberal and conservative results for this specific user interface test. The more liberal responses are distributed above and to the right of the line whereas more conservative results are below and to the left.
The slope of the trend line in both the V10 and V3 graphs indicates a user’s ability to acquire more hits in relation to also acquiring more false alarms (V10 slope = 0.53; V3 slope = 0.65). A graph of these two performances is presented in Figure 3. A steep slope indicates that users can increase hits at a higher rate than increasing false alarms when adjusting their criterion. Conversely a more flat slope means that the user will have to accept more false alarms relative to the increase in correct hits. A perfectly diagonal line on the graph (slope = 1) indicates a proportional increase in hit probability and false alarm probability when adjusting decision-making criterion. According to signal detection theory, the two lines should be parallel to each other; however since there was variation in sensitivity and bias, the trend lines are not in fact parallel.
Sensitivity of each user was calculated by finding the difference between the z-score(hit) and z-score(false alarm). Not every participant had the same sensitivity, which is expected since sensitivity is normally distributed. Average d’ value for the V10 was found to be 1.47±0.49. Average d’ for V3 interface was 2.44±0.40 indicating that sensitivity for the V3 interface was higher. Applying these sensitivity values to calculated BETA values (criterion), the more liberal response bias has a lower sensitivity score whereas the more conservative bias has a higher sensitivity. This makes sense because in order to be more conservative in responses, the user must analyze the screen more intently, contrasted by a more relaxed approach with liberal response bias in which participants would indicate a hit with the slightest inclination of a signal presence.
Personal data for performance on the V10 and V3 interfaces allows for the calculation of personal sensitivity and BETA. The results of this are represented on a receiver operating characteristic or ROC curve. ROC analysis allows for a graphical representation of performance. Values in the upper left corner of the graph represent the highest level of performance, providing a guideline for data evaluation. IF the signal strength and sensitivity remain constant, an adjustment to the BETA level will result in a curved line. Since these ROC curves were generated with only one data point each, the curved line is a guideline for performance if the BETA was manipulated. Points in the lower left of the curve indicate a conservative response, whereas points in the upper right indicate a liberal response.
Sensitivity (d’) for the personal data was found to be 1.72 and 2.52 for V10 and V3 respectively. BETA values for V10 and V3 were found to be 14.76 and 29.04, respectively. Figures 4 and 5 show the exact location of sensitivity for V10 and V3 interfaces. The data points were found by plotting the hit and false alarm rates. Compared to the group aggregate values, personal sensitivity was calculated to be higher for both interfaces. Furthermore, BETA values for personal performance indicate a more conservative performance on the tasks.
The principles of signal detection theory can be applied to interface design, helping guide decisions with the goal of optimizing performance and creating a positive user experience. This particular exercise has demonstrated a level of uncertainty in decision making, showing that signals are not always apparent when masked by internal (nervous system) and external noise (“V”). Improvements to these user interfaces to increase user performance include adjustments to decision-making time, signal strength and type, display specifications and use environment. If the program ran at a slower rate, participants would have increased processing time for each screen of information. This would allow them to scan the screen more completely thereby increasing decision confidence. If increasing decision-making time is not possible, somehow differentiating the signal from the noise would help users detect them. If the Y signals were displayed in another color or different font, detecting it would have been much easier for the user and would have improved user performance on both interfaces. A better performance would be distributed above, higher up the y-axis, than the groups for both the V10 and V3 interface results. Likewise, if the signal had an accompanying auditory tone or tactile vibration, detecting the signal would not only rely on visual acuity. Improving screen resolution would improve signal definition, making each unit more recognizable at a glance. Environmentally, reducing screen glare from light sources and ensuring adequate lighting would have an effect on performance. Also ensuring a comfortable environment for the user while limiting sources of distraction could help users maintain concentration on the task.
References
Wickens, C.D., Hollands, J.G., Banbury, S., Parasuraman, R. (2002). Signal detection and absolute judgement. Engineering Psychology and Human Performance. 4, 7-48.