Is It Possible to Read Other People’s Confidence While Testing Their Implicit Learning?

Recent studies have shown that observers can accurately read a partner’s confidence in their decision without verbal information exchange. The main question of the present study concerns the possibility of an accurate reading of the metacognitive experiences of others if they make decisions based on implicit knowledge. The second question addressed the predictors of such mindreadings if they are possible. The experiment was conducted using an artificial grammar learning paradigm. Participants worked in dyads: the Learner implicitly learned to classify stimuli as grammatical and non-grammatical, and the Observer observed the classification process while having no access to the stimuli, and not communicating with the partner. The Observer’s judgment of the Learner’s confidence, as well as the Learner's judgment of his/her confidence, were recorded. The results demonstrate that the Learner's confidence judgments correlate with the Observer's judgments of his/her confidence. Moreover, only the confidence judgments of the Learner correlate with the classification accuracy. It is concluded that intrapersonal confidence judgments and interpersonal confidence judgments are partially guided by the same criteria (in particular, response time), however, the Learners’ judgments of their own confidence in the decisions are more sensitive to the implicit knowledge. Correspondence: Alina Savina, alinasavn@gmail.ru, St. Petersburg University, 7/9 Universitetskaya nab., 199034 St. Petersburg, Russia; Nadezhda Moroshkina, moroshkina.n@gmail.com.


Introduction
The present study focuses on the interrelation between metacognition (cognition about one's own cognition) and mind reading (understanding the minds of others). For many years, metacognition and mind reading research developed as two parallel and non-intersecting fields, but a more recent research trend aims at comparing these two processes Vuillaume et al., 2019). Metacognition usually refers to the ability to monitor and control one's own cognitive processes (Flavell, 1979). This ability is most often studied using retrospective confidence judgments, which are understood as a self-evaluation of certainty in a given response. Two measures can be distinguished that relate to two different aspects of metacognition: metacognitive bias and metacognitive sensitivity. Metacognitive bias is understood as the overall judgment; for example, a participant may tend to report high confidence or low confidence regardless of his/her performance. Meanwhile, metacognitive sensitivity refers to a participant's ability to distinguish between different levels of their own performance, such as distinguishing between correct and incorrect answers in each task .
Studies of metacognitive sensitivity in various types of tasks indicate that participants often notice their mistakes even in the absence of feedback, and are able to report levels of confidence in their choices that correlate with objective performance . Moreover, people are able to demonstrate metacognitive sensitivity above the level of chance even in the case of applying implicit knowledge, when they cannot verbally substantiate their decision. This situation can be described through the dissociation of two types of knowledge, when we know that we know but do not know what we know (for more details see: Ivanchei, 2014). There is an assumption that the reported level of confidence in a situation of applying knowledge obtained implicitly is based on intuitive feelings (Price & Norman, 2008). Furthermore, intuitive feelings may stem from processing fluency. Processing fluency is defined as the ease with which information is processed in a cognitive system (Topolinski, 2009;. Studies have shown that increasing processing fluency leads to positive emotions, which can be recognized by others as a corresponding facial expression . The ability to evaluate one's own confidence in an answer is important, but the ability to read another person's confidence is just as important, as it helps to more accurately predict the decisions made by others and their behavior. As for a person's ability to read another's person's confidence, there are very few studies of this kind, unlike the many studies on the theory of mind and the ability to read the emotions of others (e.g., . However,  proposed that intrapersonal confidence judgments are generated in a way similar to evaluating others' confidence in their own performance (interpersonal confidence judgments). This idea echoes the approach of , which suggests that the ability to understand one's own confidence is the result of applying to oneself a mechanism for understanding the confidence of others. Carruthers speculates that the assessment of one's own experience involves an anticipatory or imaginative communicative framing. If so, then we can expect a correlation between intrapersonal confidence judgments and interpersonal confidence judgments. We can also expect that by reading another person's confidence one can indirectly predict the mistakes of others, and the accuracy of such forecasts may not be inferior to the judgments of oneself.
As already mentioned, until recently researchers paid little attention to the study of people's ability to read other people's confidence. Even fewer works are devoted to comparing intra-and interpersonal confidence judgments. The results of recent studies (Vuillaume et al., 2019; showed that one is able to read the confidence of others in the absence of verbal communication. Moreover, the accuracy of judging one's own confidence does not differ from the accuracy of reading the confidence of others (Vuillaume et al., 2019). This suggests that self-observation has no advantages over observing others. The study of  proposed a comparison of metacognition measures obtained in laboratory conditions and data on the performance of participants in daily conditions obtained from their significant others. It was found that it is information from the reports of others, and not self-reports, that allows a person to predict metacognition accuracy. All of the above results were obtained on the basis of perceptual tasks.
However, the ability to read another person's confidence in a situation where decisions are based on implicit learning remains unclear. To investigate this, we conducted an experiment using artificial grammar learning (AGL), a classical paradigm proposed by Reber (1967). Usually, the AGL experiment consists of two stages. First, the participants memorize a set of strings of letters, and after that they learn that the strings have been constructed under the rules of some artificial grammar. In the second stage, they classify newly presented strings as corresponding or not corresponding to the grammar. As Reber noted (Reber, 1989, p. 233), when subjects classify new strings they often rely on an intuitive feeling, not being able to verbalize the rules of the grammar by themselves. As later studies have shown, this intuitive feeling can be caused by the familiarity of the string or its structure, as a result of which the string is easier to process (structural mere exposure effect; . According to the theory of processing fluency, intuitive feelings can be used by subjects as certain keys, both for assessing the grammar of a string and for assessing their own confidence in an answer . Since intuitive feelings can be recognized through facial expression , we can assume that they are readable by an observer and can be used for interpersonal confidence judgments. Another significant cue for judging someone else's confidence in an answer can be the response time (Vuillaume et al., 2019). At the same time, if the presented string of letters is very similar or very dissimilar to the strings that were previously learned, the subject will be able to quickly make a decision on classification, and this decision will most likely be correct (e.g., . Thus, it is logical to assume that the response time can be considered by the observer as a cue to the learner's confidence in the answer. Thus it can be expected that interpersonal confidence judgments will correlate with the accuracy of grammatical judgments. Based on these assumptions, we have developed a method by which it is possible to study both intra-and interpersonal confidence judgments at the same time. Participants are grouped in dyads. The first participant (the Learner) performs classical artificial grammar learning tasks (AGL tasks) and after each classification judgment makes a judgment about their confidence in the answer. The second participant (the Observer) oversees the decisionmaking process and makes a judgment on the confidence of the Learner. In order for the Observer to concentrate on observing the actions of the Learner and not to make any explicit guesses about task difficulty, we did not show the Observer the tasks or the answers of the Learner. However, so that the Observer could concentrate on the manifestations of the Learner's confidence and perceive his/her task as reasonable, we decided to use two types of feedback. In the first type, the Observer after each response was informed whether his/her answer coincided with a self-assessment of confidence by the Learner. In the second type, the Observer was informed whether his/her answer coincided with the accuracy of the classification of the Learner.
In a previous study (Savina & Moroshkina, 2018), it was shown that intrapersonal confidence judgments most often coincide with evaluating the other's confidence in their own performance. Observers were able to read the confidence of their partners performing the artificial grammar learning task. However, the level of classification accuracy was low, presumably due to being distracted by a partner who was present during the learning phase. It remains unclear if the Observers' readings of confidence were related to the expression of the Learners' implicit knowledge, as the learning level was too low. This study is intended to replicate an earlier experiment with some modifications.
According to previous studies, we expect the Learners to demonstrate the effect of implicit learning; that is, the accuracy of their classification judgments will be significantly higher than the chance level. In this case, we also expect that the confidence judgments of participants performing the artificial grammar learning task will correlate with classification accuracy -that they will show metacognitive sensitivity.
For the participants who observe the process of stimuli classification, but do not participate in this process and do not have access to stimuli (the Observers), we expect that they will be able to read the Learners' confidence. Based on the assumption that Learners' confidence judgments and classification accuracy can manifest themselves through response time (among other ways mentioned above), response time was also recorded in the study. We considered Learners' confidence judgments, classification accuracy, and response time as potential predictors of Observers' judgments about Learners' confidence. It was also expected that the type of feedback could affect the contribution of each of the predictors.

Participants
Sixty-four dyads (128 volunteers) took part in the experiment (87 women, 41 men; mean age = 24.8 years, SD = 4.7). Each dyad was composed of people who were familiar with each other for a year or more. Before starting the experiment, the participants drew lots: one of the participants in the dyad performed in the experiment as an Observer, and the other one as a Learner.

Materials and Equipment
The experiment was run on a laptop (macOS) using Psy-choPy 1.85.4 software  to present stimuli and record responses. The following cognitive tasks were selected: 1) an artificial grammar learning task based on a combination of geometric forms (Grammar 1;  and 2) an artificial grammar learning task, involving a set of letter strings (Grammar 2). The rules for combining the stimuli of the first and second grammars were identical to each other. Strings and combinations of geometrical shapes were generated in such a way that the order of the letters/shapes was governed by Reber's (1967) artificial grammar. Eighteen stimuli consistent with this grammar were used in the learning phase. In the test phase, 18 new consistent (grammatical) stimuli and 18 new inconsistent (ungrammatical) stimuli were used. The Learners were seated so that the viewing distance from the eye to the screen was approximately 60 cm. An additional keyboard was used to record the responses of the Observer, and headphones (JBL Tune 600BTNC, noise-cancelling) were used to provide feedback to the Observer in the audio format.

Design and Procedure
The participants were told that they were taking part in a study of intuition. The experiment consisted of learning and test phases. In the learning phase, one of the participants (the Learner) was told that he/she would be presented with letter strings or combinations of shapes, and his/her task was to memorize the presented stimuli. A learning list was presented twice in a random order, resulting in 36 presentations in total. During the learning phase, the stimuli appeared in the center of the screen in white text on a black background for 3000 ms; this was followed by a blank screen for 2000 ms. Meanwhile, the second participant (the Observer) waited for the end of the learning phase, out of sight of the Learner.
After the learning phase, the Learner was informed that the stimuli obeyed a complex set of rules that governed the order of letters/figures, and he/she was asked to classify 36 novel strings according to the stimuli's grammaticality. During the test phase, each stimulus was presented for 7000 ms and then disappeared. The time for a response by a key press was not restricted. After the classification of each stimulus, the Learner was asked to assess his/ her decision confidence on a dichotomous scale (sure / not sure). Before the start of the test phase, the Observer returned to the field of view of the Learner. During the test phase, the Observer sat in front of the Learner so that he/she could see the partner's face, but was not able to observe the stimuli presented to the Learner and the given answers. The Observer was asked to assess their partner's confidence on a dichotomous scale (sure / not sure) using a separate keyboard. After each answer of the Learner, the Observer received a sound signal that informed him/her that the partner had just completed the task (see Figure 1). Then the Observer entered his/her response and received a sound feedback signal, which informed him/her about the conformity of his/her confidence judgments and the judgments of the Learner. All dyads were randomly assigned to one of three groups (group 1 had 22 dyads; groups 2 and 3 each had 21 dyads). The groups differed in the order of the Learner's and Observer's responses, as well as in the type of feedback to the Observer (see Table 1). In group 1, the Learner's judgments of confidence preceded the response of the Observer. First, the Learner assessed his/her confidence, and then the Observer partner received a sound signal and gave his/her answer. In group 2 and group 3, the Learner assessed his/her confidence after the Observer gave his/her judgment. The Observer received a signal after the Learner classified the stimulus. In group 1 and group 3, the sound feedback reported on the compliance of the ratings of confidence of the Observer and the Learner. In group 2, feedback reported on the compliance of the Observer's confidence ratings and the classification of stimuli by the Learner.
After completion of the first part of the experiment, the Learner and the Observer changed roles and the procedure was repeated using different stimulus material (Grammar 1 was replaced by Grammar 2). After the experiment, a post-experimental interview was conducted in which participants reported on the strategy that they used while reading their partner's confidence.

Results
All statistical analyses were performed using R (R Core Team, 2019) and lme4 1.1-10 (Bates et al., 2015). In the first stage, descriptive statistics were calculated for all dependent variables (the accuracy of the Learners' classification judgment, the Learners' response time, the Learners' confidence judgments, the Observers' judgments about the Learners' confidence). Then the learning level in the three groups was analyzed (preliminary data were aggregated for the subjects), since we were interested in testing the association of the Observers' ability to read the Learners' confidence and the expression of implicit knowledge. The dyads in which the Learners did not demonstrate implicit learning were removed from the following analysis (see the next paragraph for more details). After that, using mixed-effect regression models, we analyzed the relationship between the accuracy of classification judgments and the сlassification response time of the Learners. Both of these parameters were included as predictors of judgments of confidence.
At the second stage, to test the main hypotheses, we analyzed the predictors of the Learners' confidence judgments and the Observers' judgments using mixedeffect regression models.

A General Analysis of the Effect of Implicit Learning and Confidence in the Three Groups
The analysis was performed on the combined data of both grammars since no significant differences were found between the level of implicit learning for the two grammars (F (1, 127) = 0.96; p = .800; η p 2 = .001). Participants in all three groups demonstrated a learning effect by showing significant differences from chance level (50 %) in the classification performance in the test phase: group 1 -66.91 % (SD = 10) correct classification t (43) = 11.14, p < .001; group 2 -65.18 % (SD = 12); t (41) = 8.01, p < .001 and group 3 -65.21 % (SD = 14); t (41) = 6.94, p < .001. The learning level turned out to be higher than that which we obtained in a similar experiment earlier (Savina & Moroshkina, 2018). According to the one-way ANOVA, there are no significant differences in implicit learning between the groups (F (2, 126) = 0.288; p = .070, η p 2 = .050). As expected, the study showed no differences in implicit learning levels across groups with different types of feedback.
Before further analysis, the data of 16 participants were excluded, since they did not demonstrate implicit learning: Table 1.
Test Phase in Three Groups Step 1 Step 2 Step 3 Step 4 Group  the percentage of correct classification judgments was 50 % or lower. Also, the data from two participants were excluded from subsequent analysis because they did not use one of the two response categories in judgments of confidence. The first participant judged all answers as confident, and the other judged all answers as non-confident. This might reflect low motivation or a wrong idea regarding the purpose of the study. After the exclusions, the number of participants per group was 41, 34, and 35 in groups 1, 2, and 3, respectively. Finally, trials in which the response time exceeded the 3-sigma threshold were also excluded per participant (0.2 % of trials). The average classification response time of the Learners in the three groups was: in group 1, M = 4.09 seconds, SD = 2.40; in group 2, M = 3.92 seconds, SD = 2.42; in group 3, M = 4.05 seconds, SD = 2.20. To analyze the response times of the Learners, a mixed-effect regression model was fitted. The model included the group as a fixed effect and the accuracy of classification judgment and intercept for participants as random effects. The interaction between the accuracy of classification judgment and the group was also included (AIC = 17123.44, Pseudo-R² = .28). P-values were obtained by likelihood ratio tests of the full model with the effect in question against the model without the effect in question. It was found that the accuracy of classification judgment had a significant effect on the response time of the classification judgment: correct answers were faster than incorrect ones (β = − 0.621, SE = 0.112, t (3988) = − 5.533, p < .0001). Also, a significant interaction of the group factor and the accuracy of classification judgment factor was found in group 2: differences in response times between the correct and incorrect answers were less apparent than in the other two groups (β = 0.488, SE = 0.168, t (3988) = 2.910, p = .004).

The Analysis of Predictors of the Learners' Confidence Judgments
Next, we fitted a logistic mixed-effect model for the confidence of the Learner using non-aggregated data. Since the analysis described above revealed a relationship between the accuracy of classification judgment and response time, a collinearity analysis was performed, which did not reveal significant evidence for the predictors' collinearity (c-number (κ) = 5.1; r = -.09, see Table 3 for predictor correlations). The groups differed in types of feedback: in group 1, the Learners judged their confidence in the answer immediately after a classification decision, and in groups 2 and 3 they waited for the Observers to make their judgments. Thus, it can be expected that the delay could affect the Learner's judgment of confidence. Thus, the model included response delay (as a factor) as a fixed effect and the Learner's response time, accuracy of classification judgment, and intercept for participants as random effects. The interaction between the accuracy of classification judgment and response delay, as well as the interaction between the response time and response delay were included. The results confirm that only two factorsthe accuracy of classification judgment and the response time of the classification judgment -serve as significant predictors of the Learners' confidence judgments. The reduced model that is shown in Table 2 (AIC = 4275.62) was obtained by alternately eliminating insignificant factors from the full model (AIC = 4278.61; X 2 (3) = 3.008, p = .390; see full model in Appendix A).

The Analysis of Predictors of the Observers' Confidence Judgments
A logistic mixed-effect model was fitted for the Observers' confidence judgments using non-aggregated data with the group as a fixed effect and the accuracy of the Learner's classification judgment, the Learner's confidence judgment, response time, and intercept for participants as random effects. We also included the inter- action between the accuracy of classification judgment and the group, the interaction between the accuracy of classification judgment and response delay, and the same interactions for the Learner's confidence judgment into the model. As expected, mild collinearity was observed between the Learner's confidence judgment and response time (c-number (κ) = 6.9; r = -.30); there was no correlations between all other factors (c-number (κ) < 5; r < .1), see Table 3. To reduce the collinearity of those predictors, the Learner's confidence judgment was regressed against the response time.
P-values were obtained by likelihood ratio tests of the full model (AIC = 5112.96; see full model in Appendix B) against the reduced model without insignificant effects (AIC = 5103.27; χ 2 (7) = 4.316, p = .743; see Table 4). The model shows that the relation between the Observer's confidence judgments and accuracy of classification judgment is not statistically significant, while the relation between the Observer's confidence judgments and the Learner's confidence judgments is statistically significant, indicating that the Observer can track the confidence of the Learner. It is shown that the response time during classification judgment is a significant predictor of the Observer's confidence. Since the Learner's confidence judgment was regressed against the response time, these two variables' contribution to the Observer's judgment of confidence can be considered independent. The model showed no significant interactions of the factor of the group with other predictors. Thus, we can conclude that the type of feedback in the different groups did not have a significant impact on the studied effects.

Discussion
As in the previous study (Savina & Moroshkina, 2018), the Learners showed a high confidence level in all three groups for both grammars. Unlike the previous experiment, however, the Learners demonstrated metacognitive sensitivity to their classification accuracy: they judged correct answers as confident, and incorrect answers as non-confident, with greater frequency. Moreover, response times during the classification significantly varied depending on the accuracy of the classification judgments: the correct answers were faster than the incorrect ones (in groups 1 and 3). Interestingly, both variables -accuracy and the response time of the classification judgments -were independent significant predictors of judgments of the Learner's confidence in all three groups (see the analysis above).
Our hypothesis that the Observers can read the confidence of the Learners has been confirmed: it was found that a significant amount of the Observers' confidence judgments are consistent with the Learners' confidence judgments. However, the relation between the confidence judgments of the Observers and the accuracy of classification judgments of the Learners was not shown to be statistically significant. Both of these results apply to all three groups, regardless of the type of feedback. Additionally, it was found that response time was a statistically significant predictor for the Observer's confidence judgments. This is in line with previous results (Vuillaume et al., 2019), which showed that in conditions where the Observer does not have information about the problem being solved, the time of the Learner's response becomes the main predictor for reading their confidence. Although response time is a predictor of the Learner's confidence, we can claim that both variables -the Learner's response time and the Learner's confidence -were independent predictors of the Observer's confidence judgments. The result obtained indicate that there may be some additional criteria, besides response time, that contribute to both the confidence of the Learner and the confidence judgments of the Observer. Based on our results, it can be suggested that two components informed the Learners' confidence judgments. The first component is available for external reading, but it is not associated with the accuracy of classification based on implicit knowledge. The second component is associated with the accuracy of classification, but it is less susceptible to external observation. In all three groups in our experiment, the Learners performed not only the main task of classifying stimuli, but also the additional task of judging their own confidence in the answer. Thus, it is not known whether the results would be the same if the Learners performed only the main task. It was previously shown that the need to make metacognitive judgments could affect implicit learning . Perhaps it was the task to judge their own confidence that led the Learners to form conscious criteria for confidence judgments, which, in turn, were shown and then read by the Observers.
In our study, only one signal directly detected by Observers was recorded: the response time of the Learners during the classification of stimuli. A post-experimental interview suggests that some other factors can influence the reading of confidence. Participants reported that when reading their partner's confidence, they rely on facial expressions (91 participants), gestures (31 participants), and response time (19 participants). We assume that, while reading the confidence of the partner, the Observers were guided by the partner's response time. Moreover, in a situation where the Observers have no access to the stimuli, they could interpret any delay in time as a sign of uncertainty. However, various factors could be the reasons for the delay. For example, the size of the stimulus (number of letters per line / number of geometrical shapes) is one possible factor. Such factors could reduce the accuracy of the Observers' judgments in comparison with the Learners' judgments. To test this hypothesis, additional measurements and more detailed data processing are required.
In contrast to the previous study, no relation was found between the judgments of the Observers and the Learner's accuracy of the classification judgments, despite the fact that in groups 1 and 3 a relation of response time with classification accuracy was found. Presumably, feedback to the Observers could act as an interfering factor: it could change their initially chosen strategy. More research is needed to verify this hypothesis (including a non-feedback control group). Further work should be aimed at identifying possible factors that may impede or improve reading other people's confidence during implicit learning. Доступна ли внешней оценке уверенность человека в процессе применения им имплицитных знаний?