Leadership training in healthcare: a systematic umbrella review | BMJ Leader
Leadership training in healthcare: a systematic umbrella review
The importance of effective clinical leadership has been reflected in an increase in leadership development programmes. However, there remains a lack of consensus regarding the optimal structure, content and evaluation of such programmes. This review synthesised evidence from reviews of leadership development interventions for healthcare professionals published prior to October 2024, including content, methods, evaluation strategies and impact. Title, abstract and full-text screening were conducted in duplicate by two reviewers. Data extraction was piloted by two reviewers and conducted by a single reviewer. Quality appraisal was conducted using the Risk of Bias in Systematic Reviews tool by a single reviewer, with generative artificial intelligence serving as the second reviewer. 86 systematic and non-systematic reviews met inclusion criteria. Regarding educational methods, leadership training effectiveness was associated with experiential learning, mixed-methods approaches, coaching or mentoring, longitudinal designs, goal-setting, and 360-degree feedback. Group learning and interprofessional education were noted for fostering teamwork. Programmes tailored to participants’ needs and organisational contexts showed better outcomes. Content reported to be effective included interpersonal skills, self-awareness, emotional intelligence, leadership theory, communication and teamwork. Evaluations primarily relied on self-reported measures. Training outcomes were largely positive at the individual level, with participants reporting increased confidence and competence. Organisational and clinical outcomes were less frequently assessed. The long-term impact on patient outcomes and return on investment remains uncertain. Leadership development programmes were found to enhance individual competencies. However, evidence supporting long-term, system-wide impact remains limited due to reliance on self-reported evaluations and a lack of standardised evaluation approaches.
Data are available upon reasonable request. Not applicable.
http://creativecommons.org/licenses/by-nc/4.0/
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
In recent years, effective clinical leadership in healthcare has been recognised as a vital component in enhancing healthcare efficiency, from saving money to saving patient lives.1–7 Organisations have therefore invested more time, money and resources into leadership development.8 There remains, however, a lack of systematic evidence supporting clinical and organisational-level impact and return on investment (ROI),9 10 and there is not yet consensus on the most effective educational methods or content to support impact.11–14 While many authors have systematically reviewed healthcare leadership development literature, most of the published reviews have targeted specific subgroups of participants, and therefore some evidence is yet to be synthesised effectively.
In this umbrella review, we sought to aggregate and synthesise the literature on the effectiveness of healthcare leadership development, to determine best practices in leadership education and quantify effects on individual and organisational outcomes. Leadership is a multifaceted concept with evolving definitions, shaped by social dynamics, organisational behaviour and psychology, making its effects difficult to measure uniformly.15 In healthcare, leadership is often viewed through both individual (entitative) and collective or distributed perspectives, reflecting the pluralistic nature of healthcare organisations, where authority and influence are distributed rather than residing with a single individual.16–18 For the purpose of this review, we adopted Blake’s broad definition of leadership as “Achieving results with and through others”,19 while acknowledging the variation in, or lack of, definitions adopted across the synthesised literature.
We conducted this umbrella review to identify and synthesise reviews of leadership development programmes for healthcare professionals. To the best of our knowledge, this is the first systematic umbrella review on leadership development in healthcare. Our review was guided by the research questions:
Our review follows the Joanna Briggs Institute methodology for umbrella reviews.20 Our review protocol was prospectively made available on the Open Science Framework.21 Any deviations from the protocol are noted and justified.
We included any review published in a peer-reviewed journal which
We excluded papers only available as titles and abstracts, conference abstracts, protocols without results, and editorials or commentaries without results.
There were no restrictions on language or publication date. However, searches were conducted using English-language terms, which may have limited the inclusion of non-English publications.
We searched PubMed, MEDLINE, PsycInfo, CENTRAL and ABI Inform/PROQUEST databases on 20 June 2024. Our search terms encompassed keywords pertaining to leadership development, interventions/training, healthcare professionals, outcomes and reviews. We used Boolean operators to combine keywords and ensure that all inclusion criteria were met. We incorporated Bond University’s SearchRefinery to improve our search strategy.22 Our complete search strings for all databases are provided in online supplemental material 1: Search.
We also conducted systematic snowballing of reference lists from included studies and manual hand-searching of relevant journals, including Leadership Quarterly, BMJ Leader, Journal of Healthcare Management and Administrative Sciences Quarterly.
We imported search results into Covidence (www.covidence.org) for deduplication and screening. Two authors independently screened titles and abstracts and then full texts for inclusion. Any disagreements were resolved by discussion or reference to a third author. Reasons for full-text exclusions are provided in figure 1.
We used the Risk of Bias in Systematic Reviews (ROBIS) tool23 to evaluate the level of bias present within each review. ROBIS assessments were conducted independently by a single author. The risk-of-bias assessment process was amended from the initial protocol. Instead of solely relying on the primary reviewer, ChatGPT-4 (OpenAI, 2024) was used as a secondary reviewer for ROBIS assessments. This automated tool provided supplementary evaluations to enhance the reliability of the quality assessment process. However, all artificial intelligence (AI)-generated assessments were manually reviewed and validated by a human reviewer.
The ROBIS tool consists of 20 items within four main domains: study eligibility criteria, identification and selection of studies, data collection and study appraisal, synthesis and findings. Each domain has signalling questions and ends with a judgement of concerns in each domain (low, high or unclear). Finally, an overall risk-of-bias judgement is made, based on concerns across these domains.
The ROBIS tool is designed to assess systematic reviews, which typically include a formal quality appraisal. We did not adapt the tool for scoping, narrative and integrative reviews, even though these review types do not routinely incorporate formal quality appraisal as part of their standard methodology. As a result, these reviews were often rated as high risk, reflecting the absence of quality appraisal in their processes. Reviews were not excluded based on quality assessment results.
We developed and pilot-tested a standardised data extraction form in Microsoft Excel. Data extraction was piloted by two reviewers. A single reviewer then extracted data from included studies. Data extracted are presented in table 1.
Table 1
Data extracted from the included reviews
We extracted information as it was presented and analysed it using narrative synthesis. We did not contact investigators or study sponsors to provide missing data. We conducted a sensitivity analysis by including versus excluding reviews at high risk of bias (as assessed using the ROBIS tool).
As shown in figure 1, the database searches yielded 2509 results. After the removal of duplicates, titles and abstracts of 1524 records were screened. A total of 126 results were included for full-text screening, 74 of which met the inclusion criteria. An additional 12 articles were identified via hand-searching, for a final total of 86 included reviews. All included reviews and their characteristics are listed in online supplemental material 2: summary table.
Of the 86 reviews included in our umbrella review, there were 52 systematic reviews, 17 scoping reviews, 8 narrative reviews, 3 integrative reviews, 3 realist reviews, 1 focused review, 1 rapid review and 1 scoping umbrella review. The reviews predominantly focused on evaluating the effectiveness of leadership development programmes across various healthcare settings and professions. Some investigated more specific review questions, such as
Some reviews were focused on investigating specific content or methods such as
Pertinent or atypical inclusion/exclusion criteria have been highlighted in the additional information column of the summary table (online supplemental material 2: summary table).
The included reviews used at least 56 unique databases in their searches. PubMed/MEDLINE, EMBASE, CINAHL, ERIC and PsycInfo were the five most frequently used databases. These 86 reviews included a median of 24.5 articles (range 3–786). At least 1659 unique articles were included across the 86 included reviews. We are unable to report an exact figure as 9 reviews did not specify the number of included papers and 12 reviews did not provide a list of papers included.
A total of 35/86 (41%) reviews were classified as low risk of bias using the ROBIS tool. However, 9 (10%) reviews were classified as unclear risk and 42 (49%) reviews as high risk. The most common source of high bias risk was absence of quality appraisal. Reviews were also classified as high risk if screening, data extraction or quality appraisal were conducted by a single reviewer. Most ‘unclear’ risk ratings were due to a lack of detail in descriptions of search strategies, data extraction processes or quality appraisal methods. Scoring is detailed in online supplemental material 3: ROBIS assessment.
The evaluation methods used predominantly focused on a narrow set of non-validated, subjective measures. Most studies used only self-reported methods, such as questionnaires, participant feedback and self-assessed knowledge or behaviour change. Pre-intervention/post-intervention designs were common, but few studies employed observer assessments, control groups or included long-term follow-up.
There was a notable absence of standardised evaluation frameworks, making it difficult to compare results between studies or draw broader conclusions. The evaluations were often ad hoc, using tools designed in-house, and varied significantly in methodology and rigour. Only a small number of primary studies employed validated instruments for evaluation, and those that did used a diverse array of tools, hindering meaningful comparisons. This was compounded by many studies not disclosing how their evaluation tools were developed or whether they had been tested for reliability. There was a notable lack of consensus on the most effective evaluation instrument to use.
Reviews mostly reported individual-level outcomes (Kirkpatrick levels 1 and 2), such as leadership knowledge or skills, or more vaguely reported positive outcomes without specifying what these were. Behavioural outcomes (Kirkpatrick level 3) were reported in 36/86 (42%) reviews. Organisational and clinical (Kirkpatrick level 4) outcomes were reported by only one-third of the reviews (30/86, 35%).10–13 25 26 28 30 31 33 38 40 41 43–59 14 of these (16%) presented weak evidence only.
Several reviews called for more rigorous and consistent evaluation methods moving forward. Specific recommendations included
A total of 47/86 (55%) reviews conducted quality appraisal of their included studies. Of these, 38/86 (44%) used validated tools, 3 used adapted versions of validated tools and 6 conducted independent evaluations. Overall, 19/86 (22%) different validated appraisal tools were used. In order of frequency, the five most commonly used were the Mixed Methods Appraisal Tool, the Medical Education Research Study Quality Instrument, the Critical Appraisal Skill Programme Tool, Best Evidence Medical Education tools, the Effective Public Health Practice Project Tool and the Joanna Briggs Critical Appraisal Checklist. A complete list of tools used for each review is included in online supplemental material 2: summary table.
Quality scores generally fell in the low–moderate reliability range. The reviews identified substantial heterogeneity in study designs, participant characteristics and outcome measures. Common methodological limitations of primary studies included reliance on single-group designs, convenience sampling with small sample populations and subjective, self-reported outcomes (online supplemental material 3: ROBIS assessment). Additionally, there was frequently poor reporting of demographic information and intervention design. There were limited experimental or controlled study designs and a lack of longitudinal follow-up. Many studies exhibited a high risk of bias, particularly due to confounding variables, unclear blinding procedures and selective outcome reporting.
For the above reasons, many reviews found it difficult to draw strong conclusions across studies. Some reviews excluded primary studies judged to be extremely poor quality.60 83 Other reviews emphasised higher quality evidence in their synthesis.10 26 29
There was considerable variation in the ‘population of interest’ of the included reviews (figure 2). Doctors and nurses were the most frequently included populations.
The majority of reviews reported that included studies had universally or predominantly positive outcomes, for example,62 65 often without exploring neutral or negative findings in depth, making it difficult to quantify the number of neutral or negative findings. While a small number of reviews did mention studies with negative findings, for example,12 38 or no significant effects,40 41 59 these instances were rarely examined critically. 8/86 (9%) reviews were unable to draw definitive conclusions on overall impact,42 73 76 83 88–91 but no review concluded that overall impact was negative. Of note, 4/86 (5%) reviews did not assess/discuss impact.24 64 69 92
Positive participant reactions included high satisfaction and frequent recommendations. Participants reported increased confidence, enthusiasm and recognition of the leadership training’s value in both personal and professional contexts. However, as noted in some reviews,31 33 46 81 93 insights into long-term effectiveness were limited.
Gains in leadership knowledge included improvements in decision-making,13 33 35 38 50 communication skills,10 13 14 28 30 33 35 40 50 self-awareness,10 30 33 35 50 55 self-reported leadership competence and confidence in leadership roles,31 33 52 87 conflict resolution28 30 33 and self-reflection.28 35
Behavioural changes in leadership practice were most commonly self-reported improvements, such as enhanced collaboration, time management, taking on leadership roles, research, quality improvement initiatives, reflective practices and better management of personal and professional boundaries.10 14 30 33 40 94 A small number of primary studies included observer evaluations, which corroborated participants’ self-reported behavioural changes.28 51 95 96
Reported organisational outcomes included improved team processes and culture,38 50 94 staff retention38 40 and career progression.10 53 97 Some programmes were linked to improved patient outcomes,10 25 26 44 52 85 94 such as improved patient satisfaction, reduced patient complaints, with one review specifying improvements in patient morbidity and mortality.25 Some reviews reported more broadly on ‘organisational success’10 39 41 while others reported organisational or Kirkpatrick level 4 outcomes without any specification. However, most reviews did not assess organisational outcomes, primarily due to the limitations of the primary studies in evaluating these broader impacts.
Educational content was often not reported in primary studies. When reported, content varied significantly between leadership programmes. As a result, only 19 reviews were able to draw conclusions on educational content elements associated with increased effectiveness (figure 3).
Interpersonal skills content was most frequently associated with greater impact.13 40 49 57 58 94 97 This included building professional networks, communication, collaboration, teamwork, shared decision-making, recognising team member competencies and inclusivity.
Personal development content, particularly in areas like self-direction, reflection and self-awareness, was reportedly linked to improved behavioural outcomes such as effective communication and collaboration, which benefited the organisation’s culture.28 29 49 92 97 98 Emotional intelligence was specifically found to be beneficial in three reviews.13 29 37
Leadership theory content was noted by five reviews to be effective for developing networked and team-based leadership abilities,48–50 68 94 with four specifically referencing transformational leadership.
Other content areas were less frequently evidenced (by only one or two reviews). These included the use of structured curricula or frameworks such as the Medical Leadership Competency Framework,99,83 100 adult learning theory,10 88 communication,94 97 team dynamics,40 57 critical thinking and problem-solving,13 97 change management68 and breadth of content in general.58
Many included reviews made recommendations for content elements without providing evidence of their impact or effectiveness. For example, four reviews24 37 50 58 recommended conflict management as a content area, but none linked it to improved leadership development outcomes.
Negative or neutral impacts of educational content components were rarely reported. No educational content was linked to a negative impact, although over-reliance on leadership theory without integrating other elements was found to be ineffective.92
Specific educational methods were more consistently associated with impact, with 45 reviews drawing conclusions on educational methods that were associated with increased effectiveness. Figure 4 presents the frequency of evidence-based recommendations of educational methods and content.
Experiential learning methods were associated with impact in 31 studies.10–12 27–29 34 37 40 45–48 50 56–58 61 63 65 67 68 71 74 75 82–85 92 97 Examples of experiential learning included action learning, service/quality improvement projects, self-improvement projects, role-play and simulation. These methods were noted for promoting self-awareness and improving team dynamics, although one review found that simulation or role-play was less associated with organisational outcomes than other experiential learning methods.11
17 reviews emphasised the effectiveness of using multiple learning methods13 34 45 46 49 56 58 60 65 68 71 74 82 85 92 101 such as a combination of lectures, seminars, performance appraisals, 360-degree feedback and experiential learning. Using didactic teaching alone was found to be less effective.
Coaching and/or mentoring were associated with greater leadership outcomes.10–12 34 38 39 46 48 56–58 67 68 71 74 84 85 92 102 Both approaches were shown to improve performance, enhance self-awareness and provide strong support for other developmental activities. One-on-one coaching was frequently highlighted as particularly effective, although group coaching also offered valuable benefits.38 One review found that coaching and mentoring were more often linked to achieving organisational-level outcomes.12
10 reviews reported that longitudinal intervention designs, characterised by multiple sessions spread over time, were associated with lasting behavioural change while short interventions were found to be less effective.34 37 40 48 49 56 58 63 75 100 However, there was no clear consensus on the ideal duration or number of sessions. The time between sessions was seen as valuable for allowing participants to reflect on their experiences and apply newly acquired skills in practice.
The use of goal-setting and self-development projects, ideally with opportunities for self-reflection, performance appraisal and 360-degree feedback, was noted to be beneficial in developing self-awareness, a crucial component of leadership capability.10 11 34 42 46 48 49 58 67 68 71 75 92 101
Learning group composition was frequently discussed. Six reviews reported that the inclusion of group-learning activities increased effectiveness.34 56 61 85 98 101 Small group teaching was favoured among three reviews.56 83 85 However, few reviews were able to make evidence-based recommendations on group size and vastly different class sizes were reported among the included studies. Reviews often reported a predominance of uni-professional teaching,11 78 85 though interprofessional or multidisciplinary learning approaches were found to be important for enhancing both teamwork and clinical outcomes.29 45 50 58 82 94
Four reviews emphasised the importance of conducting a thorough needs assessment and ensuring stakeholder engagement when designing the curriculum. Tailoring interventions to suit the specific management and leadership needs of participants and organisational context was reportedly associated with greater outcome effect.40 48 49 101
Setting for learning delivery varied widely. Use of online teaching/E-learning was commonly reported, with many programmes combining face-to-face and online formats.13 26 32 38 42 One review reported that such blended formats were effective in allowing participants to apply skills in real world settings.82 Interventions delivered exclusively online were not as effective in producing outcomes as those with at least some degree of face-to-face learning.34 Face-to-face courses were noted for their ability to facilitate peer feedback, networking opportunities and their effectiveness in driving learning, behavioural change and professional development.34 49
Interventions that combined internal and external faculty were theorised to be more effective than those led solely by internal or external faculty11 40 49 88 as they leverage both contextual knowledge and leadership expertise.
We compared the methods and content findings from all 86 included reviews with those from the 35 low-risk (high-reliability) reviews. There was minimal difference between the conclusions of the low-risk studies and the complete data set. Most educational methods and content elements demonstrated consistent proportions of reviews reporting supporting evidence across both groups. However, experiential learning was associated with impact in a greater proportion of all studies (36%) compared with low-risk reviews (26%). Table 2 presents the programme elements supported by high-reliability reviews.
Table 2
'Gold-standard’ programme elements
Aggregating and synthesising findings from 86 reviews on leadership development programmes allowed us to identify significant trends, limitations and areas for improvement. Only 35 reviews were classified as low risk of bias, suggesting that many existing reviews of leadership development interventions may not provide fully reliable conclusions due to methodological weaknesses. Evaluations predominantly relied on self-reported measures and lacked long-term follow-up. Programmes demonstrated encouraging impacts, particularly at the level of participant learning and satisfaction (Kirkpatrick levels 1 and 2). Higher levels, such as organisational and clinical outcomes (Kirkpatrick level 4), were assessed in only one-third of reviews, although the impact was observed in studies that measured these. 19/86 (22%) reviews reported evidence linking specific content areas to improved outcomes. Content focused on interpersonal skills, self-awareness, leadership theory, emotional intelligence, communication and teamwork was reported to increase effectiveness. Experiential learning, multiple learning methods, coaching or mentoring, goal-setting, 360-feedback, tailored and longitudinal designs were often associated with increased programme effectiveness, as evidenced by a slightly larger proportion of reviews. Group learning and interprofessional education were noted for fostering teamwork in some reviews.
A striking finding of this review was the lack of a shared theoretical foundation in leadership training. Few studies explicitly stated the leadership theories underpinning their interventions, leaving it unclear whether programmes aimed to develop individual, collective or system-wide leadership capabilities. This absence of theoretical clarity likely shaped how programmes were designed, delivered and evaluated, contributing to inconsistencies in reported outcomes;16 17 without a clear conceptual framework, leadership development risks being fragmented, difficult to compare across studies and misaligned with organisational needs.
Another key gap was the limited exploration of mechanisms of impact. While many reviews reported positive outcomes, few explored how specific content or methodologies drove behavioural or organisational change. As higher quality reviews suggested, assuming causality based on association is problematic, especially given the variability and flaws in evaluation methods. An exception may be programmes incorporating action learning or leadership projects, which were more directly linked to observable outcomes.
Tailoring interventions to participant and organisational needs, along with stakeholder engagement in curriculum design, was associated with greater impact. This supports the notion that explicitly stating level 4 outcomes as programme goals, and embedding them in evaluation frameworks, can promote accountability among faculty and participants, encouraging alignment with system-level objectives.
Despite this, organisational context, including where and how leadership training is delivered, was rarely examined as a factor. Some reviews suggested that embedding training within healthcare institutions, rather than delivering it externally, might enhance relevance and impact. However, there was little empirical evidence supporting this. More broadly, leadership is increasingly recognised as a context-dependent process, shaped by organisational structures, workplace culture and system-wide dynamics.103 104 Yet, much of the existing research remains leader-centric, often neglecting how environmental and situational factors influence leadership effectiveness. This ‘context deficit’ in leadership research has been linked to an overemphasis on individual traits and styles, rather than examining how leadership emerges in response to specific organisational challenges.103
Future research should aim to address these gaps by clearly defining leadership within each programme, specifying whether the focus is on individual, team-based or system-wide leadership and aligning this with organisational goals.89 Greater use of structured frameworks, such as context–mechanism–outcome models40 103 or theories of change, could provide a more nuanced understanding of how specific elements of leadership development contribute to intended outcomes. Additionally, research should explore the role of organisational context, including factors such as institutional support, integration within existing systems and participant motivation, to better understand how these influence programme success.
It is notable that medical students comprised only 13% of participants in leadership development interventions. While this is a single data point, it raises important questions about the extent to which leadership is integrated into medical school curricula. Given the growing recognition of leadership as a core competency for healthcare professionals, this under-representation may indicate a gap in formal leadership training at the undergraduate level. Developing future healthcare leaders requires early exposure to leadership principles, yet the data suggest that current approaches may not sufficiently prioritise this need. Integrating leadership training earlier in medical education and ensuring it is tailored to the realities of clinical practice could help address this shortfall. Future research should examine how programme methods and content might be optimally aligned with leadership level.
Finally, reliance on self-reported, short-term outcomes limited our ability to assess the long-term, real-world impact of leadership development interventions. Authors emphasised the need for more rigorous evaluations, advocating for validated tools, longitudinal assessments and external observer ratings to better capture both immediate and sustained effects. Yet, these approaches remain consistently underused. To ensure leadership training delivers meaningful impact, there is an urgent need for a standardised evaluation framework that integrates validated measures and best practices while remaining flexible enough to accommodate the contextual nuances of each programme.
To our knowledge, this study is the first systematic umbrella review of leadership development interventions in healthcare settings. It covers a wide range of review types, including both quantitative and qualitative studies, and provides a broad and well-rounded perspective on leadership development interventions. Additionally, the use of the ROBIS tool to assess methodological quality strengthens the reliability of our findings by systematically identifying potential biases in the included reviews.
Data extraction was performed by a single reviewer, introducing a potential source of bias. This approach was necessitated by the resource and time constraints associated with the process. However, extracting data verbatim using a standardised format likely mitigated this risk. ROBIS assessments were also conducted by a single reviewer, with generative AI (ChatGPT-4, OpenAI, 2024) used as a second reviewer. Emerging literature suggests that AI tools are increasingly used in systematic reviews, demonstrating promising results when combined with human reviewers.105 However, these tools currently lack the nuanced judgement and contextual understanding required for independent assessments.106 107 In this review, AI was integrated cautiously, with outputs carefully scrutinised by the human reviewer to maintain the integrity of the review process.
Many of the included reviews did not conduct formal critical appraisal of their primary studies, and the quality of the primary studies varied, with common issues such as small sample sizes, lack of control groups and subjective, self-reported outcomes. This variability in study quality may affect the strength of the conclusions drawn from this review.
Neutral or negative impacts were rarely reported or explored in reviews, likely due to a lack of reporting in primary studies. Published literature is often biased towards reporting positive or successful outcomes. A notable challenge in medical education research is the tendency to publish studies that legitimise existing programmes rather than critically evaluate their effectiveness.108 109 We must therefore consider the possibility of overestimating the effectiveness of leadership development interventions.
This umbrella review highlights the growing emphasis on leadership development in healthcare while underscoring persistent gaps in programme design, content and evaluation. While interventions consistently report positive individual-level outcomes, evidence for sustained organisational and clinical impact remains limited due to reliance on self-reported measures and a lack of standardised evaluation frameworks, which is consistent with wider literature.104 110–113 The findings suggest that effective programmes incorporate experiential learning, coaching or mentoring, longitudinal structures and tailored content, particularly in areas such as interpersonal skills, emotional intelligence and communication. However, the absence of rigorous long-term assessments, objective outcome measures and the programmes’ theoretical and contextual underpinnings presents a challenge to fully understanding their impact.
Future research should prioritise the development and adoption of standardised, validated evaluation tools based on the latest leadership theory developments that allow for comparability across studies while remaining flexible enough to accommodate diverse healthcare contexts. This could be achieved by integrating existing competency frameworks with validated leadership assessment instruments, ensuring alignment with key organisational priorities. Greater use of external observer ratings, 360-degree feedback and mixed-methods evaluation approaches would strengthen the evidence base. Additionally, embedding leadership evaluation within routine clinical and organisational data collection could facilitate the long-term tracking of outcomes without placing excessive burden on participants. To obtain high-quality evidence, researchers should aim to move beyond descriptive studies and small-scale evaluations, instead designing controlled, longitudinal studies that assess the sustained impact of leadership development on team performance, staff retention and patient care. Given that leadership is inherently relational—achieved with and through others19—evaluations must also consider outcomes at the team and organisational levels to fully capture the impact of leadership development. Strengthening the methodology of future research will be essential to ensuring that leadership training delivers tangible benefits for healthcare professionals, organisations and patients.
Data are available upon reasonable request. Not applicable.
Not applicable.
Not applicable.