Zhiwei Zhao, Yuhang Liu, Kai Xu • Semantic Scholar
**Finding:** This meta-analysis of 28 studies found that agility training incorporating perception-action coupling (reacting to external stimuli) produces large improvements in reactive agility performance (ES = 0.65) and moderate improvements in pre-planned agility, with junior and national-level athletes showing the greatest benefits.
**Why it matters:** This provides evidence-based guidance for designing training interventions that integrate perceptual-cognitive demands with physical movement, which could inform HCI research on human-computer interaction in dynamic, reactive environments.
**Method:** The authors used three-level meta-analysis to handle multiple effect sizes within studies and conducted both linear and non-linear meta-regressions to establish dose-response relationships for training interventions.
**Jargon:** Perception-action coupling refers to the integration of perceiving environmental stimuli with executing appropriate movement responses; reactive agility test (RAT) measures responses to unpredictable stimuli while pre-planned agility test (PAT) measures predetermined movement patterns; Hedge's g is an effect size measure that corrects for small sample bias.
Jun Hong Lim, M. Yunus, Wei Lun Wong • Semantic Scholar
**Finding:** This systematic review of 14 studies found that AI writing tools consistently enhance learners' motivation in English writing through five key mechanisms: increased confidence and self-efficacy, higher engagement and interest, improved feedback and support, enhanced autonomy, and reduced writing anxiety.
**Why it matters:** This provides evidence-based guidance for integrating AI writing tools in educational settings to improve student motivation and self-directed learning outcomes.
**Method:** The authors conducted a PRISMA-guided systematic review of empirical studies from 2021-2025 across six countries, examining various AI tools including ChatGPT, Grammarly, and Automated Writing Evaluation systems.
**Jargon:** Automated Writing Evaluation (AWE) systems are AI tools that automatically assess and provide feedback on written text quality and correctness.
**Finding:** This scoping review of 11 studies found that mHealth applications significantly reduced neck pain in 8 studies and improved functional outcomes, with high patient satisfaction and adherence rates. The apps delivered various exercise interventions including neck mobility routines and postural re-education programs.
**Why it matters:** This demonstrates how mobile technology can address a core HCI challenge - sustaining user engagement with therapeutic interventions - while providing evidence for designing health behavior change applications.
**Method:** Scoping review methodology searching four databases (PubMed, Embase, ProQuest, Google Scholar) from 2004-2024, including randomized controlled trials, pre-post designs, and cohort studies.
**Jargon:** mHealth refers to mobile health applications; NPRS (Numeric Pain Rating Scale), VAS (Visual Analog Scale), NDI (Neck Disability Index), PSFS (Patient Specific Functional Scale), MSK-HQ (Musculoskeletal Health Questionnaire), and SF-36 (Short Form-36) are standardized clinical outcome measures for pain, function, and health status.
Tayebeh Mahvar, H. Mashalchi, Ferdos Polarak et al. (6 authors) • Semantic Scholar
**Finding:** Tele-nursing (12 weeks of phone-based nurse counseling) significantly improved quality of life, medication adherence, and reduced hospital readmissions in hypertensive patients compared to controls, with readmission rates dropping from 79% to 8%.
**Why it matters:** This demonstrates how remote support systems can effectively change health behaviors and outcomes, relevant for designing digital health interventions and understanding team-based care delivery.
**Method:** Quasi-experimental study with 96 hypertensive patients randomly assigned to intervention (phone counseling) or control groups, measured over 12 weeks using validated quality of life and medication adherence scales.
**Jargon:** Tele-nursing refers to remote nursing care delivered via telephone; MMAS (Morisky Medication Adherence Scale) is a standardized tool measuring how well patients follow prescribed medication regimens.
⭐ Longitudinal design ⚠️ Self-reported data only | Single study (not replicated)
**Finding:** A cross-sectional study of Filipino adolescents aged 14-17 found an association between time spent on social media and depression severity, though the study cannot establish causation.
**Why it matters:** This provides empirical evidence for discussions about technology's impact on wellbeing in HCI courses and highlights the importance of considering mental health outcomes when designing social platforms.
**Method:** The study used an analytical cross-sectional design with questionnaires incorporating the PHQ-9 Modified for teens depression scale and analyzed data using Pearson Chi-Square tests.
**Jargon:** PHQ-9 (Patient Health Questionnaire-9) is a standardized clinical tool used to screen for and measure the severity of depression symptoms.
⚠️ Self-reported data only | Cross-sectional design (no longitudinal data)
**Finding:** This engineering study demonstrates two methods for earthquake resistance in buildings: spring-based vibration damping and magnetic levitation isolation using simple shake table experiments with building models.
**Why it matters:** This is not relevant for HCI, teams, or goal-setting research and teaching as it focuses on structural engineering and earthquake resistance.
**Method:** The researchers built physical prototypes using shake tables with DC motors to simulate earthquakes, comparing fixed buildings against those with spring damping and magnetic levitation systems.
**Jargon:** Shake table refers to a platform that simulates earthquake motion for testing structures; magnetic levitation uses repelling magnets to create a gap that isolates structures from ground vibrations.
**Finding:** Nigeria has substantial renewable energy resources for hydrogen production but lacks a detailed national hydrogen policy with clear, measurable targets, creating a major obstacle to achieving its Energy Transition Plan goals.
**Why it matters:** This demonstrates how goal-setting requires specificity and measurability to be effective, particularly in complex multi-stakeholder environments like national energy policy.
**Method:** The study used comparative analysis between Nigeria's current hydrogen development status and international benchmarks, combined with comprehensive review of energy strategies and cost estimates.
**Jargon:** Green hydrogen is produced using renewable energy sources, while blue hydrogen is produced from natural gas with carbon capture technology.
**Finding:** Higher knowledge, positive perceptions, information media exposure, and health worker encouragement significantly increased HIV counseling and testing service utilization among at-risk populations, with perception being the strongest predictor (OR 5.6).
**Why it matters:** This demonstrates how individual cognition, social influence, and information access affect health technology adoption - principles directly applicable to HCI design and team-based health interventions.
**Method:** Case-control study comparing 75 service users vs 75 non-users from high-risk populations using logistic regression to identify utilization predictors.
**Jargon:** HCT = HIV counseling and testing services; OR = odds ratio (measures how much more likely an outcome is given a factor); CI = confidence interval (statistical range of certainty).
**Finding:** Each additional information source a woman accessed increased her odds of cervical cancer screening by 4.66 times, yet despite 82% having heard of cervical cancer, only 6% had been screened and 92% felt inadequately informed.
**Why it matters:** This demonstrates how information access alone is insufficient without culturally appropriate design and trusted delivery channels - a critical lesson for designing health information systems in underserved communities.
**Method:** Mixed methods study with 174 rural Kenyan women using cross-sectional surveys and semi-structured interviews, analyzed through logistic regression and thematic analysis with integrated interpretation.
**Jargon:** No specialized terms requiring explanation for an HCI audience.
⚠️ Very small sample (n=21) | Self-reported data only | No control group mentioned | Single study (not replicated)
**Finding:** The Otago Exercise Programme (muscle strengthening, balance training, and walking exercises) administered 3 times weekly for 8 weeks significantly improved balance and reduced fall risk in post-stroke patients who could walk independently.
**Why it matters:** This provides evidence for a specific, structured intervention that could inform the design of rehabilitation technologies or goal-setting frameworks for stroke recovery teams.
**Method:** Quasi-experimental design with 30 post-stroke patients measured on standardized balance and fall risk scales before and after the 8-week intervention program.
**Jargon:** Berg Balance Scale (BBS) measures balance performance; Timed Up and Go Test (TUGT) assesses mobility and fall risk; Falls Efficacy Scale (FES) measures fear of falling; Brunnstrom recovery stage 4 indicates patients can perform voluntary movements outside of rigid movement patterns.
⚠️ No control group mentioned | Single study (not replicated)
**Finding:** School management teams (SMTs) play the central role in developing school-based curriculum in South African secondary schools, but effective curriculum development requires collaboration with both internal and external stakeholders, with success heavily influenced by principals' understanding of school-based curriculum concepts.
**Why it matters:** This demonstrates how distributed leadership and stakeholder collaboration in goal-setting (curriculum development) can impact team performance outcomes, providing a real-world case study for teaching about collaborative goal-setting processes in educational organizations.
**Method:** Mixed methods explanatory sequential design using confirmatory factor analysis on survey data from 279 educators and thematic analysis of principal interviews across performing and underperforming schools.
**Jargon:** School-based curriculum (SBC) refers to locally-developed curriculum that supplements or adapts national curriculum standards; School Management Team (SMT) includes principals, deputy principals, and department heads who oversee school operations; socio-cultural theory (SCT) emphasizes learning and development through social interaction and cultural context.
**Finding:** Knowledge work performance evaluation systems need reconstruction after ChatGPT's emergence, requiring a three-dimensional model measuring AI Tool Mastery, Collaborative Work Quality, and Human-AI Synergy rather than traditional human-only metrics. AI skills are now required in 27.8% of knowledge worker jobs (376% growth since ChatGPT launch) and command a 17.7% wage premium.
**Why it matters:** This directly impacts how you should teach performance evaluation in HCI and team contexts, as traditional assessment frameworks are inadequate for measuring human-AI collaborative work that's rapidly becoming standard.
**Method:** The researchers analyzed 5,000 LinkedIn job postings and 2,000 Indeed salary records from 2022-2024 to track skill requirement and compensation changes following ChatGPT's release.
**Jargon:** Human-AI Synergy refers to the emergent capabilities that arise from effective human-machine collaboration beyond what either could achieve independently; AI Tool Mastery means proficiency in using AI systems as work tools rather than just understanding AI concepts.
**Finding:** This literary analysis of Vicki Hastrich's memoir examines how humans cognitively engage with material objects and non-human entities, arguing that we can move beyond evolutionary patterns of "thinking through materiality" to "thinking about it" in more complex ways. The paper shows how the memoir reveals both human and non-human agency in these material interactions.
**Why it matters:** This provides a theoretical framework for understanding human-computer interaction and material engagement that could inform HCI design by recognizing the agency and cognitive complexity involved in interactions with technological artifacts.
**Method:** The study uses literary analysis of a memoir combined with interdisciplinary theoretical frameworks from cognitive archaeology and new materialist philosophy to examine material-cognitive relationships.
**Jargon:** "Thinking through materiality" vs "thinking about it" refers to evolutionary progression from using objects as cognitive tools to reflecting on our relationship with objects; "distributive agency" means agency is shared between humans and non-human entities rather than residing only in humans; "new materialism" is a philosophical approach that recognizes the active agency of material objects and natural forms.
Haonan Zhang, Dongxia Wang, Yi Liu et al. (5 authors) • Semantic Scholar
**Finding:** LLM-VA resolves the trade-off between jailbreak vulnerability and over-refusal in safety-aligned LLMs by aligning the "answer vector" with the "benign vector" through closed-form weight updates, making the model's willingness to respond causally dependent on its safety assessment. This approach achieves 11.45% higher F1 scores than baselines while maintaining 95.92% utility across 12 LLMs.
**Why it matters:** This demonstrates how understanding the internal representation structure of AI systems can lead to targeted interventions that solve seemingly fundamental trade-offs, which is relevant for teaching about AI system design and human-AI interaction safety.
**Method:** The researchers use SVMs to identify orthogonal answer and safety judgment vectors at each layer, then apply minimum-norm weight modifications to align these vectors without fine-tuning or architectural changes.
**Jargon:** Answer vector (v_a) represents the model's decision to respond; benign vector (v_b) represents the model's judgment of input safety; jailbreak refers to successfully getting the model to answer harmful queries; over-refusal means declining to answer benign queries.
⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract
Haitao Li, Fangmin Zhu, Yu Song et al. (8 authors) • Semantic Scholar
**Finding:** Malaysian and Chinese university students' motivation emerges from six key configurations that blend intrinsic desires (curiosity, competence, autonomy) with external pressures (grades, parental expectations, recognition), with timely effort-focused feedback being crucial for sustaining engagement and strategy use.
**Why it matters:** This demonstrates how feedback design and autonomy support directly impact student motivation and persistence, providing concrete strategies for improving educational outcomes in achievement-oriented systems.
**Method:** Semi-structured interviews with 25 undergraduates using purposive sampling across institutions, majors, genders, and years, analyzed through thematic analysis to identify emergent motivational patterns.
**Jargon:** Self-Determination Theory (SDT) focuses on three basic psychological needs: autonomy (feeling volitional), competence (feeling effective), and relatedness (feeling connected); expectancy-value theory examines how expectations of success and perceived task value influence motivation; attribution theory studies how people explain causes of success and failure.
Farah Mohamed Zain, Marini Kasim, Faizahani Ab Rahman et al. (4 authors) • Semantic Scholar
**Finding:** Researchers identified five key constructs (motivation, technology skills, instructional design, AR development tools, and AR application types) needed for an AR framework that helps teachers enhance student motivation, based on combining Gagne's instructional events with self-determination theory.
**Why it matters:** This provides HCI educators with a structured framework for integrating AR into team-based learning environments while addressing the gap between AR technology potential and practical teacher implementation.
**Method:** The study used design and development research (DDR) methodology with mixed-methods data collection from 35 technology teachers through purposive sampling and closed/open questionnaires.
**Jargon:** Design and Development Research (DDR) is a methodology focused on creating and refining educational products or frameworks; Self-Determination Theory (SDT) examines intrinsic motivation through autonomy, competence, and relatedness needs; Gagne's nine events of instruction is a systematic framework for designing effective learning experiences.
**Finding:** This theoretical study identifies four main barriers to student self-management in Chinese high schools under the "Double Reduction" policy: low student motivation, rigid institutional management, inadequate teacher training, and poor home-school collaboration. The authors propose a five-dimensional framework to shift from external control to autonomous development through updated philosophies, curriculum restructuring, management innovation, teacher role transformation, and collaborative education.
**Why it matters:** This work provides a systematic approach to fostering student self-regulation and autonomy that could inform HCI design for educational technologies and team-based learning environments in academic settings.
**Method:** The study uses case analysis of a provincial exemplary high school's reform practices to construct their theoretical framework.
**Jargon:** "Double Reduction" policy refers to China's educational reform aimed at reducing homework burden and after-school tutoring; "connotative development" means internal quality improvement rather than external expansion; "path dependency" describes institutional resistance to changing established management practices.
**Finding:** FOMO drives impulsive purchasing through anxiety-driven participation while JOMO promotes mindful, autonomous consumption through withdrawal from social pressure, creating a fundamental tension in how digital interfaces influence consumer decision-making.
**Why it matters:** This dual-emotion framework is essential for understanding how persuasive design in digital interfaces can either exploit anxiety (FOMO) or support user autonomy (JOMO), directly impacting how teams design ethical user experiences and set goals around user well-being.
**Method:** The study develops a theoretical framework by integrating affective psychology, self-determination theory, and dual-process models with case studies from flash-sale e-commerce, travel marketing, and digital detox brands.
**Jargon:** FOMO (Fear of Missing Out) refers to anxiety about missing rewarding experiences others are having; JOMO (Joy of Missing Out) refers to pleasure derived from disconnecting and focusing on personal priorities rather than social pressures.
**Finding:** Autonomy consistently enhances memory encoding and learning through predictive processing mechanisms, with autonomous choice activating reward-related brain regions more reliably than external monetary incentives.
**Why it matters:** This provides neurological evidence for designing educational technologies and team structures that prioritize learner autonomy over external rewards to achieve deeper learning outcomes.
**Method:** The dissertation used four empirical studies examining different operationalizations of autonomy (binary choices and active exploration) while measuring both cognitive and neural responses.
**Jargon:** Predictive processing refers to the brain's mechanism of forming expectations about upcoming information, which is enhanced when learners have autonomous control over their learning choices.
**Finding:** The paper develops a new primal-dual algorithm for multi-armed bandit problems where both rewards and constraints change over time, achieving theoretical guarantees for both minimizing regret and violating constraints.
**Why it matters:** This addresses realistic scenarios where recommendation systems or resource allocation must adapt to changing user preferences while respecting dynamic constraints like budget or fairness requirements.
**Method:** The approach extends online mirror descent with specialized gradient estimators to handle the dual challenge of unknown rewards and time-varying constraints.
**Jargon:** Multi-armed bandits are sequential decision problems where an agent repeatedly chooses actions to maximize rewards while learning; dynamic regret measures performance loss compared to the best sequence of actions in hindsight; primal-dual algorithms simultaneously optimize an objective and satisfy constraints.
Rayna Hata, Masaki Kuribayashi, Allan Wang et al. (5 authors) • ArXiv
**Finding:** Blind users' delegation strategies with navigation robots evolved over repeated interactions, with participants developing clearer preferences about when to rely on the robot versus act independently through multiple real-world museum sessions.
**Why it matters:** This demonstrates how human-robot collaboration preferences are dynamic rather than static, suggesting HCI systems should adapt to users' evolving mental models and delegation strategies over time.
**Method:** Longitudinal repeated exposure study with six blind participants using a navigation robot in an actual museum environment with realistic tasks like crowd navigation and obstacle avoidance.
**Jargon:** Delegation refers to users' decisions about when to transfer control or responsibility to the automated system versus maintaining personal control of the navigation task.
⭐ Longitudinal design ⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract
**Finding:** HARMONI is a multimodal personalization framework that uses large language models to enable socially assistive robots to maintain long-term, personalized interactions with multiple users simultaneously through four integrated modules for perception, world modeling, user modeling, and response generation.
**Why it matters:** This demonstrates how HCI systems can move beyond single-session interactions to build sustained relationships with users, which is crucial for understanding long-term user engagement and personalization in human-computer interaction.
**Method:** The researchers conducted extensive evaluation including ablation studies on four datasets plus a real-world user study in a nursing home environment to test speaker identification, memory updating, and ethical personalization.
**Jargon:** Multimodal refers to processing multiple types of input (visual, audio, text); ablation studies systematically remove components to test their individual contributions; socially assistive robots are designed to help users through social interaction rather than physical tasks.
Nacereddine Sitouah, Marco Esposito, Francesco Bruschi • ArXiv
**Finding:** eIDAS 2.0 regulation attempts to integrate Self-Sovereign Identity (SSI) principles into European digital identity frameworks, but analysis reveals significant legislative gaps and implementation challenges that may undermine user control and privacy goals.
**Why it matters:** This highlights tensions between regulatory compliance and user-centered design principles that are fundamental to HCI research on identity systems and digital governance.
**Method:** Theoretical analysis of regulatory text and accompanying documentation compared against established SSI principles from existing literature.
**Jargon:** Self-Sovereign Identity (SSI) refers to decentralized identity models where users control their own digital credentials without relying on centralized authorities; eIDAS is the EU's electronic identification and trust services regulation framework.
Fan Yang, Renkai Ma, Yaxin Hu et al. (4 authors) • ArXiv
**Finding:** Anthropomorphism determines whether people care about robot mistreatment, while individual moral foundations shape how they justify their moral reasoning about robots. Low-progressivism individuals make character-based judgments about robot abuse, while high-progressivism individuals engage in future-oriented moral thinking.
**Why it matters:** This reveals how user characteristics and robot design features interact to shape ethical responses, which is crucial for designing socially acceptable robots and understanding team dynamics when robots are involved.
**Method:** Mixed-methods experiment where 201 participants watched videos of robots with varying human-like features (Spider, Twofoot, Humanoid) being physically mistreated, then completed moral reasoning assessments.
**Jargon:** Anthropomorphism refers to attributing human characteristics to non-human entities; moral foundations theory categorizes different bases for moral reasoning, with progressivism indicating liberal moral orientations focused on care and fairness.
⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract
**Finding:** A modular reasoning-driven approach that considers schema coverage, structural connectivity, and semantic alignment significantly outperforms simple embedding and LLM-prompting methods for routing natural language database queries in enterprise environments with multiple overlapping databases.
**Why it matters:** This demonstrates how structured reasoning approaches can outperform popular embedding-based methods in complex organizational contexts, relevant for teaching about systematic vs. intuitive design approaches in HCI systems.
**Method:** The researchers extended existing natural language-to-SQL datasets to create realistic multi-database benchmarks and compared their reasoning-based reranking strategy against embedding-only and direct LLM-prompting baselines.
**Jargon:** Schema coverage refers to how well a database structure matches query requirements; structural connectivity means relationships between database elements; semantic alignment is the meaningful correspondence between natural language terms and database concepts.
Andre Paulino de Lima, Paula Castro, Suzana Carvalho Vaz de Andrade et al. (6 authors) • ArXiv
**Finding:** The paper presents a recommendation model designed specifically for gerontological primary care that uses psychometric data structure to generate visual explanations for care plan recommendations that healthcare professionals can understand and trust.
**Why it matters:** This demonstrates how to design interpretable AI systems for high-stakes domains where users need to understand system reasoning, which is crucial for HCI courses covering explainable AI and trust in automated systems.
**Method:** The researchers conducted both offline performance evaluation on Brazilian healthcare datasets and user studies to evaluate how well healthcare professionals could interpret the visual explanations generated by their model.
**Jargon:** Psychometric data refers to measurements of psychological attributes like cognitive abilities, personality traits, or mental health indicators that are commonly collected through standardized tests and questionnaires in healthcare settings.
TrungKhang Tran, TrungTin Nguyen, Gersende Fort et al. (8 authors) • ArXiv
**Finding:** The authors develop an incremental stochastic Majorization-Minimization (MM) algorithm that generalizes incremental stochastic EM for streaming data processing, proving it converges to stationary points and demonstrating superior performance on mixture of experts problems compared to standard optimizers like SGD and Adam.
**Why it matters:** This provides a new theoretical framework for incremental learning algorithms that could inform how teams approach collaborative learning from streaming data and adaptive goal-setting in dynamic environments.
**Method:** The approach uses stochastic approximation theory to prove convergence guarantees and validates performance on synthetic data plus two real-world datasets including a bioinformatics study of drought-stressed maize.
**Jargon:** Majorization-Minimization (MM) is an optimization technique that iteratively replaces a complex objective function with simpler surrogate functions; mixture of experts (MoE) refers to models that use multiple specialized sub-models with a gating mechanism to determine which expert handles each input.
⚠️ No control group mentioned | Single study (not replicated)
Ganesh Sundaram, Jonas Ulmen, Daniel Görges • ArXiv
**Finding:** This paper presents a component-aware pruning framework that uses gradient-based metrics (Gradient Accumulation, Fisher Information, and Bayesian Uncertainty) to identify which parts of multi-component neural network controllers can be compressed while preserving functionality, revealing structural dependencies that traditional norm-based pruning methods miss.
**Why it matters:** This work is relevant for HCI research involving complex AI systems where understanding which components are critical for performance could inform interface design and user interaction with AI-powered systems.
**Method:** The authors test their gradient-based importance estimation framework on an autoencoder and a TD-MPC agent, comparing against conventional norm-based pruning approaches.
**Jargon:** TD-MPC refers to Temporal Difference Model Predictive Control, a reinforcement learning approach; structured pruning means removing entire groups of parameters rather than individual weights; Fisher Information is a measure of how much information a parameter carries about the model's predictions.
⚠️ No control group mentioned | Single study (not replicated)
Shanyv Liu, Xuyang Yuan, Tao Chen et al. (8 authors) • ArXiv
**Finding:** CASTER uses a dual-signal router that combines semantic and structural features to dynamically assign appropriate-strength AI models to different sub-tasks in multi-agent systems, reducing computational costs by up to 72.4% while maintaining performance quality.
**Why it matters:** This addresses a key challenge in designing efficient AI-powered collaborative systems where teams of agents must balance task performance with resource constraints—directly relevant to understanding how intelligent systems can optimize team coordination and goal achievement.
**Method:** The system uses a Cold Start to Iterative Evolution training paradigm where the router learns from its own routing failures through on-policy negative feedback, essentially learning which tasks need stronger vs. weaker models.
**Jargon:** Multi-Agent Systems (MAS) are networks of AI agents working together; LLM-as-a-Judge uses large language models to evaluate system performance; on-policy negative feedback means the system learns from mistakes made by its own decisions rather than external examples.
⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract
Peter Zeng, Weiling Li, Amie Paige et al. (9 authors) • ArXiv
**Finding:** Large Vision-Language Models (LVLMs) fail to develop effective common ground with humans in referential communication tasks, showing poor performance when trying to collaboratively identify unlabeled objects through interactive dialogue.
**Why it matters:** This reveals a fundamental limitation in current AI systems' ability to collaborate with humans, which is essential for designing effective human-AI interfaces and team interactions.
**Method:** The researchers used a factorial design comparing all possible director-matcher pairs (human-human, human-AI, AI-human, AI-AI) across multiple dialogue rounds to identify abstract objects without obvious names.
**Jargon:** Common ground refers to shared knowledge and understanding that develops between communicators; referential communication is the process of using language to help someone identify a specific object or concept; LVLMs are Large Vision-Language Models that can process both images and text.
⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract
Mónica Ribero, Antonin Schrab, Arthur Gretton • ArXiv
**Finding:** The authors develop kernel-based statistical tests using f-divergences that can detect different types of differences between datasets, with particular application to evaluating machine unlearning and differential privacy.
**Why it matters:** This provides new tools for HCI researchers to evaluate whether AI systems have truly "forgotten" user data when requested, which is increasingly important for user privacy and trust in interactive systems.
**Method:** They use regularized variational representations of f-divergences estimated through kernel methods, with adaptive hyperparameter selection for practical implementation.
**Jargon:** F-divergences are mathematical measures of difference between probability distributions; machine unlearning refers to removing specific data points from trained AI models; Hockey-Stick divergence is a specific type of f-divergence useful for privacy applications; witness function is the mathematical function that achieves the variational representation of the divergence.
⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract
**Finding:** Multimodal Large Language Models (MLLMs) can generate missing product information (text or images) for e-commerce but struggle with fine-grained alignment and show inconsistent performance across product categories, with model size not correlating with better performance.
**Why it matters:** This demonstrates significant limitations in current AI systems for real-world cross-modal tasks, relevant for understanding how teams might collaborate with AI tools and set realistic goals for AI-assisted content generation.
**Method:** The researchers created MMPCBench, a specialized benchmark testing six state-of-the-art MLLMs across nine e-commerce categories on both content quality and recommendation system performance for missing modality completion tasks.
**Jargon:** MLLMs are AI models that can process and generate both text and images; missing modality completion means generating absent product information (like creating product descriptions from images or vice versa); GRPO is a training technique to better align model outputs with desired outcomes.
⚠️ No control group mentioned | Single study (not replicated)
**Finding:** A new Bayesian statistical method improves accuracy when estimating health behaviors (like smoking) for small population subgroups by explicitly accounting for data coarsening issues like rounding and digit preference that commonly occur in self-reported survey data.
**Why it matters:** This addresses a fundamental data quality problem that affects many HCI studies relying on self-reported measures, where users may round responses or show digit preferences that bias results.
**Method:** The approach uses a two-part Bayesian model that separates prevalence estimation from intensity estimation while explicitly modeling the coarsening process, tested through simulation studies and applied to Italian smoking survey data.
**Jargon:** Small Area Estimation (SAE) refers to statistical methods for estimating parameters in domains with small sample sizes by borrowing information across related areas; coarsening mechanisms include rounding behaviors and digit preference (tendency to report certain numbers like multiples of 5 or 10) that reduce data precision.
**Finding:** LoPRo introduces a fine-tuning-free post-training quantization method that uses block-wise permutation and Walsh-Hadamard transformations to improve low-rank matrix quantization, achieving accuracy comparable to fine-tuning methods while being 4× faster.
**Why it matters:** This work is not directly relevant to HCI, teams, or goal-setting research and teaching.
**Method:** The authors developed a novel algorithm combining block-wise column permutation with Walsh-Hadamard transformations and introduced mixed-precision fast low-rank decomposition based on rank-1 sketch (R1SVD).
**Jargon:** Post-training quantization (PTQ) compresses neural network models by reducing numerical precision after training; low-rank approximation decomposes matrices into products of smaller matrices; Walsh-Hadamard transformation is a mathematical operation that rotates data while preserving certain properties; perplexity measures how well a language model predicts text (lower is better).
⚠️ No control group mentioned | Single study (not replicated)
**Finding:** KeepLoRA prevents catastrophic forgetting in continual learning by restricting parameter updates to a residual subspace that's orthogonal to both pre-trained knowledge and previous task knowledge. The method identifies that general knowledge resides in the principal subspace while task-specific knowledge occupies residual subspaces.
**Why it matters:** This provides a principled approach for designing adaptive learning systems that can acquire new capabilities without losing existing ones, relevant for HCI applications where systems must continuously adapt to user needs.
**Method:** The approach uses gradient projection onto orthogonal subspaces within the LoRA (Low-Rank Adaptation) framework, combined with theoretical analysis of how different types of knowledge are encoded in parameter space.
**Jargon:** LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning method that adds low-rank matrices to pre-trained models; continual learning refers to learning new tasks sequentially without forgetting previous ones; catastrophic forgetting is when neural networks lose previously learned knowledge when learning new tasks.
Thomas Bömer, Nico Koltermann, Max Disselnmeyer et al. (5 authors) • ArXiv
**Finding:** A-CEoH uses LLMs with evolutionary frameworks to automatically generate heuristic functions for A* search that outperform expert-designed heuristics by including the A* algorithm code directly in the prompt for better contextual understanding.
**Why it matters:** This demonstrates how AI can augment human expertise in algorithm design, relevant for teaching students about the intersection of AI assistance and traditional computer science problem-solving.
**Method:** The researchers extended an evolutionary framework (EoH) with a novel prompt augmentation strategy that embeds A* code into LLM prompts, then tested on warehouse logistics and sliding puzzle problems.
**Jargon:** EoH (Evolution of Heuristics) is a framework that uses evolutionary algorithms to automatically generate heuristic functions; A* is a graph search algorithm that uses heuristics to efficiently find optimal paths; UPMP refers to optimizing container arrangements in warehouses.
⚠️ No control group mentioned | Single study (not replicated)
Runyu Peng, Yunhua Zhou, Demin Song et al. (7 authors) • ArXiv
**Finding:** Multi-head Explicit Attention (MEA) enables explicit communication between attention heads in large language models through learnable linear combinations, improving performance while allowing 50% reduction in memory usage with minimal accuracy loss.
**Why it matters:** This demonstrates how explicitly designing for component interaction (rather than implicit coordination) can improve both performance and efficiency in AI systems that students increasingly use and need to understand.
**Method:** The approach combines Head-level Linear Composition modules that mix key/value vectors across attention heads with group normalization to align statistical properties of the recombined heads.
**Jargon:** Attention heads are parallel processing units in Transformer models that focus on different aspects of input; KV-cache stores key-value pairs to speed up text generation; low-rank decomposition reduces parameters by approximating full matrices with smaller components.
**Finding:** LLM-Enhanced Reinforcement Learning (LERL) uses a two-level hierarchical system where an LLM selects diverse content categories at the high level while a reinforcement learning agent recommends specific items at the low level, significantly improving long-term user satisfaction compared to existing recommendation systems. This approach addresses the filter bubble problem that occurs when systems overfit to short-term user preferences.
**Why it matters:** This demonstrates how combining different AI approaches hierarchically can solve complex HCI problems, offering insights for designing systems that balance immediate user preferences with long-term engagement goals.
**Method:** The researchers used a hierarchical reinforcement learning framework with real-world datasets, where the LLM handles semantic planning for content diversity while RL manages personalized item selection within those semantic constraints.
**Jargon:** Filter bubble effects refer to users getting trapped seeing only similar content due to algorithmic overfitting to their immediate preferences; sparse, long-tailed interactions means most user-item combinations have little to no interaction data, with a few popular items dominating.
⚠️ No control group mentioned | Single study (not replicated)
Luisa Jansen, Tim Ulmann, Robine Jordi et al. (4 authors) • ArXiv
**Finding:** The paper proposes using red teaming (simulated re-identification attacks) to test the effectiveness of research data anonymization, where one team attempts to re-identify participants while another team tries to prevent it.
**Why it matters:** This provides HCI researchers with a practical method to validate their data anonymization practices before publishing datasets, addressing a widespread problem where researchers struggle with proper anonymization.
**Method:** The authors applied red teaming versus blue teaming methodology (borrowed from cybersecurity) to test anonymization of mixed-methods HCI study data and created reusable materials for other researchers.
**Jargon:** Red teaming refers to simulated attacks where one team tries to break security (here, re-identify participants), while blue teaming refers to the defensive team trying to prevent such attacks (here, protecting participant anonymity).
Yuansong Xu, Yichao Zhu, Haokai Wang et al. (10 authors) • ArXiv
**Finding:** Physicians' trust in LLMs for medical diagnosis is miscalibrated because their perceptions of LLM capabilities differ significantly from standardized benchmark performance measures. The study with 37 physicians reveals that what physicians value in clinical reasoning doesn't align with how LLMs are typically evaluated.
**Why it matters:** This demonstrates a critical gap between AI system evaluation methods and user trust formation that likely applies across many AI-assisted decision-making domains beyond healthcare.
**Method:** The researchers designed clinical cases, collected LLM analyses, and had physicians evaluate the LLM reasoning to quantitatively measure perceived diagnostic capabilities compared to benchmark performance.
**Jargon:** LLMs (Large Language Models) are AI systems like GPT that can process and generate human-like text; miscalibrated trust refers to when users' trust levels don't match the actual capabilities of the AI system.
Shuning Zhang, Qucheng Zang, Yongquan `Owen' Hu et al. (10 authors) • ArXiv
**Finding:** VisGuardian uses group-based privacy controls that allow AR glasses users to efficiently manage visual data permissions by selecting one object to control entire groups of related private objects, rather than managing each object individually.
**Why it matters:** This demonstrates how interface design can make complex privacy controls usable at scale - a key principle for designing team collaboration tools and privacy-sensitive HCI systems.
**Method:** The researchers used YOLO object detection with pre-classified grouping schema and conducted a comparative user study (N=24) against slider-based and object-based baseline interfaces.
**Jargon:** YOLO is a real-time object detection algorithm; mAP50 (mean Average Precision at 50% intersection over union) is a computer vision accuracy metric measuring how well the system detects and locates objects.
**Finding:** A systematic review of immersive virtual reality (IVR) learning environments reveals significant gaps in supporting learners' metacognitive planning and motivational processes, leading to a new Self-Regulated Learning Support Framework (SRL-SF) for designing better VR learning experiences.
**Why it matters:** This provides a practical framework for designing educational VR systems that better support student goal-setting, planning, and self-regulation - key issues in HCI design for learning environments.
**Method:** The authors conducted a state-of-the-art literature review structured around Zimmerman's three-phase cyclical model of self-regulated learning (forethought, performance, self-reflection) and created an interactive web-based audit tool.
**Jargon:** Self-Regulated Learning (SRL) refers to learners' ability to set goals, monitor progress, and adjust strategies; "transposition challenge" means the difficulty of transferring traditional learning support tools into virtual environments; Immersive Virtual Reality (IVR) refers to fully immersive VR experiences that isolate users from physical learning aids.
Wei Wu, Elizabeth Margulis, Kelly Jakubowski • PsyArXiv
**Finding:** Music systematically influences the content of spontaneous thoughts, with people generating more semantically similar thoughts when listening to the same musical genre compared to different genres, and this effect varies by age group.
**Why it matters:** This demonstrates that supposedly "unconstrained" mental processes can be systematically influenced by environmental design choices, which has direct implications for how we design interfaces and environments to support creativity and focus in teams.
**Method:** Researchers used natural language processing to analyze open-ended thought descriptions from participants across four age groups who listened to ten different music genres, then measured semantic similarity of thoughts within and across genres.
**Jargon:** Semantic similarity refers to how closely related thoughts are in meaning and content, measured computationally through natural language processing techniques.
⚠️ No control group mentioned | Single study (not replicated)
Amy X. Li, Benjamin Chan, Sofya Donets et al. (5 authors) • PsyArXiv
**Finding:** Covert attention (mental focus without eye movement) and value-based choices influence each other bidirectionally - valuable options draw covert attention, while covert attention changes how we value options and make decisions. Covert and overt attention compete and have separate effects on choice behavior.
**Why it matters:** This challenges current decision-making models that only consider eye gaze, suggesting that interface design and team decision processes need to account for mental attention allocation beyond just where people look.
**Method:** The researchers combined an attentional probe task with value-based choice tasks to directly measure covert attention allocation to peripheral options during decision-making.
**Jargon:** Covert attention refers to mental focus or attention directed somewhere without moving the eyes, while overt attention involves actual eye movements and gaze direction.
**Finding:** The paper argues that borderline personality disorder (BPD) may meet traditional biomedical criteria for diagnostic validity but still lacks legitimacy because these criteria fail to account for how the diagnosis regulates identity, relationships, and social deviance while causing structural harm.
**Why it matters:** This demonstrates how established validation frameworks can be inadequate for evaluating complex constructs that affect human behavior and social interaction, which is relevant for HCI systems that classify or assess users.
**Method:** The analysis draws on feminist, decolonial, neurodiversity, and lived experience-led scholarship to critique traditional biomedical validation models.
**Jargon:** Robins and Guze criteria are traditional biomedical standards for establishing psychiatric diagnostic validity; diagnostic circularity refers to using symptoms to define a condition and then using the condition to explain those same symptoms.
**Finding:** This paper proposes a theoretical framework for converting narrative psychological case formulations into machine-readable causal graphs using large language models, treating clinical reasoning as a structured causal cognition process that can be computationally represented.
**Why it matters:** This framework could inform how we design AI-assisted decision-making tools for teams and how we represent complex causal reasoning in human-computer interfaces.
**Method:** The authors use conceptual workflow development with simple worked examples to demonstrate LLM-based conversion of clinical text to structured graphs, employing semantic similarity methods and graph comparison metrics like the Jaccard index.
**Jargon:** Psychological formulation refers to the clinical process of creating narrative explanations linking patient observations to underlying causes; causal graphs are structured diagrams representing cause-and-effect relationships; psychological ontologies are formal knowledge structures that define concepts and relationships in psychology.
**Finding:** The paper argues that debates about complex PTSD versus borderline personality disorder miss the point by focusing only on diagnostic overlap, when the real issue is that BPD functions as a stigmatizing label that damages credibility and causes harm through structural power dynamics and biased treatments.
**Why it matters:** This challenges how we frame mental health in HCI design and team dynamics, suggesting that diagnostic labels aren't neutral descriptors but can perpetuate harm and bias in systems and interpersonal interactions.
**Method:** The author uses lived experience research and epistemic injustice analysis to examine how psychiatric classifications function as power structures rather than neutral clinical tools.
**Jargon:** CPTSD = complex post-traumatic stress disorder; BPD = borderline personality disorder; epistemic injustice = when someone's knowledge or credibility is unfairly discredited due to bias; iatrogenic harm = harm caused by medical treatment itself; dialectical behaviour therapy = a specific therapeutic approach for BPD.
Victoria Newell, Neil Anderson, Ayman Eckford et al. (10 authors) • PsyArXiv
**Finding:** Participatory Systems Mapping workshops with underserved autistic adults revealed a "Misalignment-Invalidation Cycle" where communication barriers and clinician invalidation create reinforcing loops that drive disengagement from healthcare.
**Why it matters:** This demonstrates how co-production methods can surface systemic barriers for marginalized users that traditional research approaches miss, providing a model for inclusive HCI research with underserved populations.
**Method:** The study combined co-production with Participatory Systems Mapping, using persona development, user journey mapping, and stakeholder prioritization across four workshops with 31 participants from underserved communities.
**Jargon:** Co-production means involving end users as equal partners in research design and execution rather than just as subjects; Participatory Systems Mapping is a collaborative method for visualizing complex cause-and-effect relationships in systems; causal loop diagrams show how variables influence each other in reinforcing cycles.
⚠️ Very small sample (n=6) | No control group mentioned | Single study (not replicated)
**Finding:** GLMMs with moderately complex random-effects structures produce more stable and conservative effect size estimates than ANOVA or signal detection theory when analyzing binary judgments, while avoiding assumption violations and handling missing data better.
**Why it matters:** This provides clear guidance for analyzing binary decision data in HCI studies involving user choices, team decisions, or goal achievement outcomes.
**Method:** The authors systematically compared three analytical approaches across 20 open datasets examining the illusory truth effect to demonstrate practical differences between methods.
**Jargon:** GLMM (Generalized Linear Mixed Models) handles non-normal data with random effects; SDT (Signal Detection Theory) separates decision bias from sensitivity; illusory truth effect is the tendency to believe repeated statements more than novel ones.
Yun-Xiao Li, Johanna Falben, Lucas Castillo et al. (8 authors) • PsyArXiv
**Finding:** Outcome utility (how good or bad a result is) does not systematically bias most people's mental simulations of risky events, but a stable subgroup shows optimistic bias, especially when monetary stakes are more prominent.
**Why it matters:** This challenges assumptions about uniform cognitive biases in risk assessment and suggests individual differences matter when designing systems or interventions that involve uncertainty and decision-making.
**Method:** Researchers used a random generation paradigm where participants mentally simulated gamble outcomes and verbalized them, then compared these with probability judgments and predictions across four experiments.
**Jargon:** Mental simulation refers to the cognitive process of imagining possible outcomes to evaluate probabilities; outcome utility is the subjective value (positive or negative) of a potential result.
⭐ Pre-registered ⚠️ No control group mentioned | Single study (not replicated) | Limited statistical detail in abstract