Risk factors for neuroendocrine neoplasms: protocol for a case-control study based on a record linkage of registry and claims data
Risk factors for neuroendocrine neoplasms: protocol for a case–control study based on a record linkage of registry and claims data
Recent studies showed an increase in neuroendocrine neoplasms, especially for the digestive tract. Several risk factors have been suggested to explain this increase, including a family history of cancer, tobacco smoking, alcohol consumption and metabolic disorders such as diabetes and obesity. Another risk factor may be depressive disorders, which could increase the risk of neuroendocrine neoplasms either directly or mediated through associated risk behaviours and/or antidepressant medication. Here, we outline the design of our study to identify the risk factors for neuroendocrine neoplasms in Germany.
A case–control study of the resident population of Bavaria, the second most populous federal state in Germany, based on a record linkage of data from the Bavarian Cancer Registry and data from the Bavarian Association of Statutory Health Insurance Accredited Physicians. Cases have a diagnosis of a malignant neuroendocrine neoplasm, either of the bronchopulmonary system or the gastroenteropancreatic system, in the period from 2021 to 2023. Controls are sampled from the non-cases and matched on sex, birth year (in 5-year intervals) and time of diagnosis (by calendar quarter). Risk factor prevalence of cases and controls is assessed on the basis of assured outpatient diagnoses, that is, diagnoses documented in at least 2 out of 4 consecutive quarters in the 16 quarters preceding the diagnosis of a neuroendocrine neoplasm. The analysis uses conditional logistic regression to estimate ORs and 95% CIs.
This study protocol was approved by the Ethics Committee of the Bavarian State Chamber of Physicians (reference number: 24008). Approval by the supervisory authority has been obtained from the Bavarian State Ministry of Health, Care, and Prevention (reference number: G35h-A1080-2023/20-2) and also the Bavarian Data Protection Commissioner stated to have no concerns after presentation of the study protocol (reference number: DSB/7-692/1-275). The results of the case–control study will be presented at national as well as international conferences and be published in the form of scientific articles in peer-reviewed journals.
Data sharing not applicable as no datasets generated and/or analysed for this study.
http://creativecommons.org/licenses/by-nc/4.0/
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
Neuroendocrine neoplasms (NENs) are malignancies of neuroendocrine cells.1 Neuroendocrine cells can be found throughout the body, that is, where there is epithelium, excluding the central nervous system, bones and soft tissue. NENs are rare and heterogeneous tumours, which most commonly arise in the gastroenteropancreatic system (GEPS) and the bronchopulmonary system (BPS).1–3 NENs are categorised according to their differentiation into differentiated neuroendocrine tumours (NETs) and poorly differentiated carcinomas (NECs).4 NETs can be further subdivided based on grade while NECs are further categorised into small-cell and large-cell carcinomas. In addition, there are also mixed neuroendocrine-non-neuroendocrine neoplasms.
Recent studies from the USA and Germany showed an increase in NENs, especially for the digestive tract.5–9 According to an analysis of data from the US Surveillance, Epidemiology and End Results programme, age-standardised incidence rates (US standard population 2000) significantly increased for most gastrointestinal sites between 1975 and 2008.7 For instance, the age-adjusted incidence rate for rectal NENs increased from 0.1 in 1975 to 1.1 per 100 000 persons in 2008.7 The risk for NEN is known to increase with age.2 Several other risk factors have been suggested by a limited number of studies, including a family history of cancer, tobacco smoking, alcohol consumption and metabolic disorders such as diabetes and obesity.2 10 Apart from this, there may be other risk factors that have not been confirmed yet, such as depressive disorders, which could increase the risk of NENs either directly or mediated through associated risk behaviours and antidepressant medication. Kenner,11 for instance, discusses the role of depressive disorders in pancreatic cancer. In the past two decades, several studies have explored the relationship between depression and the metabolic syndrome,12–14 which is a cluster of risk factors, including raised blood pressure, dyslipidaemia, raised fasting glucose and central obesity.15 Prospective cohort studies observed a bidirectional association for depression and the metabolic syndrome.13 Both diseases, depression and the metabolic syndrome, are linked with insulin resistance and chronic inflammation involving the endocrine and immune systems.13 14
The aim of this study is to identify the risk factors for NENs based on data of the Bavarian Cancer Registry, Germany, to help understand the increase in NENs.
The study design is a case–control study based on data from the population-based Bavarian Cancer Registry and data from the Bavarian Association of Statutory Health Insurance Accredited Physicians (KVB, German: Kassenärztliche Vereinigung Bayerns). The Bavarian Cancer Registry is based on mandatory notifications by physicians and healthcare providers regarding the diagnosis and treatment of cancer.16 The KVB collects, on a quarterly basis, diagnosis and treatment data related to its main task, that is, ensuring and reimbursing outpatient medical care and psychotherapy for patients with Statutory Health Insurance (GKV, German: Gesetzliche Krankenversicherung) in Bavaria. As common for claims data, the treatment data of the KVB, however, are limited to fee schedule items, including non-specific flat fees, which is why we will focus on the diagnosis data. Bavarian Cancer Registry data and KVB data are linked by pseudonymised record linkage following the probability linkage procedure established by the German cancer registries.17 18 The probability linkage is based on the conversion of KVB identity data into unique tokens, that is, pseudonymisation, that are compared with the tokens already present in the Bavarian Cancer Registry by probabilistic linkage.18 After linkage, the pseudonyms are removed, resulting in an anonymous data set.
The setting is the resident population of Bavaria in the period from 2021 to 2023. Bavaria is the second most populous federal state in Germany, with about 13 million residents. Though the Bavarian Cancer Registry covers the complete resident population of Bavaria, the source population of the study is limited to persons who are insured with the GKV and had at least one outpatient physician contact between 2021 and 2023 within Bavaria. Persons who are not insured in the GKV as well as persons without outpatient physician contact between 2021 and 2023 within Bavaria are not included in the KVB data. In Bavaria, about 11.5 million residents, which is 85% of all residents, are insured with the GKV.19
Eligible cases are defined as persons with a malignant NEN of the BPS or GEPS diagnosed in the period from 2021 to 2023. Definition of a malignant NEN as well as of BPS and GEPS is based on the fifth edition of WHO’s Classification of Tumours, also known as the WHO Blue Books,4 20 and the International Statistical Classification of Diseases and Related Health Problems, 10th Revision (ICD-10),21 respectively. Detailed information on the eligible combinations of histology code and tumour site is provided in online supplemental file 1. The selection of cases is based on the record linkage of Bavarian Cancer Registry and KVB data, that is, persons with a malignant NEN in the period from 2021 to 2023 according to the Bavarian Cancer Registry data that had a GKV insurance as well as an outpatient physician contact between 2021 and 2023. Eligibility is not limited to age.
Eligible controls are defined as non-cases in the same source population as for cases, that is, the KVB data. Selection of controls is done by random sampling.
Cases will be matched 1:2 with controls on sex, birth year (5-year intervals) and time of diagnosis (by calendar quarter). Two controls will be matched to each case.
The recruitment of cases and controls is planned to start on 15 May 2025 and end on 31 August 2025, followed by the data analysis. The recruitment comprises the data extraction at the Bavarian Cancer Registry and the KVB, the record linkage of both data sets and the merging of outcomes, exposures and confounders.
Outcomes
Primary outcomes are NEN of the BPS and the GEPS as defined by eligible combinations of histology code and tumour site (see online supplemental file 1). Secondary outcomes are the two subcategories of NEN, that is, NET and NEC, and the tumour characteristics stage and grade.
Exposures and potential confounders
The following exposures and potential confounders will be assessed:
Exposures: disease-related
Exposures: sociodemographic
Potential confounders: sociodemographic and healthcare-related
Birth year (5-year intervals).
Sex.
Rurality of the residence district.
Healthcare utilisation frequency.
Potential confounders: time of NEN diagnosis
Time of diagnosis (calendar quarter).
Primary outcomes as well as their subcategories NET and NEC are measured using the ICD-10 codes and histological information in the data of the Bavarian Cancer Registry (see online supplemental file 1). Stage, grade, previous malignant neoplasms, birth year (5-year intervals), sex and time of diagnosis (by calendar quarter) are also taken from the Bavarian Cancer Registry data. Stage and grade are measured according to the TNM classification of the Union of International Cancer Control.22 Previous malignant neoplasms include all neoplasms with ICD-10 codes C00-C97 (malignant neoplasms), except C44 (other malignant neoplasms of skin).21 For cases and controls, the prevalence of the disease-related exposures, apart from previous malignant neoplasms, is assessed by specific ICD-10 diagnoses based on the KVB data (see online supplemental file 1 for further details). Only assured ICD-10 diagnoses will be considered,23 that is, diagnoses that were recorded in at least 2 out of 4 consecutive quarters in the 16 quarters preceding the quarter of the NEN diagnosis for cases as well as controls, which are matched by quarter of NEN diagnosis. Healthcare utilisation frequency is operationalised based on the use of outpatient medical care by calendar quarter and physician group, as defined by the KVB. Area deprivation of the residence municipality of cases and controls is measured by the Bavarian Index of Multiple Deprivation, which is based on official data.24 In particular, we use the deprivation quintile of the residence municipality at the time of NEN diagnosis. Rurality of the residence district at the time of NEN diagnosis is based on the dichotomous categorisation of districts into urban area and rural area by the German Federal Institute for Research on Building, Urban Affairs and Spatial Development.25
The case–control study is based on a record linkage of registry data, claims data and administrative data. These data sources are, unlike survey data, not prone to recall bias. The potential for selection bias is considerably reduced by a small number of matching variables, that is, birth year, sex and time of NEN diagnosis, as well as the use of an almost unselected source population for cases and controls, that is, the KVB data. The KVB data cover all persons with GKV (about 85% of all residents in Bavaria) and at least one outpatient physician contact in Bavaria within the study period from 2021 to 2023. According to the KVB, 94% of all GKV insured persons had, for instance, at least one contact with a general practitioner in 2021,26 so that about 80% of all residents are included in our source population. It is known that the proportion of persons with GKV, who use outpatient care, is higher for women compared with men and lower for younger persons compared with older persons.23 Regarding potential differential misclassification bias, assessment of disease-related exposures is done in the 16 quarters preceding the quarter of the NEN diagnosis to ensure that the prevalence and number of disease-related exposures are not influenced by potentially increased clinical investigation related to the NEN diagnosis. The potential of reverse causality, that is, subclinical NEN causing disease-related exposures such as depression and not vice versa, is addressed by sensitivity analyses excluding the eight quarters preceding the quarter of the NEN diagnosis from the assessment of disease-related exposures.
Confounding is controlled by matching for birth year, sex and time of NEN diagnosis. We will assess the matching process by comparing the distributions of the matching variables between cases and controls graphically and by summary statistics of the distributions. Conditional logistic regression will be employed to account for matching of controls to cases. Furthermore, differences in existing infrastructure of outpatient care between urban areas and rural areas may be associated with the prevalence of diagnosis-related exposures. This potential confounding is additionally controlled in sensitivity analyses by adjusting models for the rurality of the residence district.
Based on the data of the population-based Bavarian Cancer Registry (until 27 March 2025), the number of incident NEN cases in Bavaria, Germany, is 5943 cases for the study period from 2021 to 2023, of which 3274 were NENs of the BPS and 2669 were NENs of the GEPS. Taking into account that the KVB data cover 85% of all residents in Bavaria, Germany, 5051 cases may be expected at maximum after the record linkage of the Bavarian Cancer Registry and KVB data. As not all residents with incident NEN between 2021 and 2023 may receive outpatient treatment in Bavaria and as the record linkage may not identify all possible NEN cases in the KVB data, a record linkage for 80% of all cases is probably more realistic. Based on this assumption, about 4750 cases and 9500 matched controls would be expected. With more than 4500 expected cases (about 2620 BPS cases and 2130 GEPS cases) and more than 9000 expected controls, the linked data set has a considerable size and is the best available approach to exploit outpatient data for a risk factor analysis for NEN. Even for GEPS tumour sites, such as the small intestine and the pancreas, we may expect 590 and 465 cases, respectively.
The descriptive analysis, stratified by BPS and GEPS, includes the calculation of frequencies and percentages for categorical variables, the mean and median (with SD and IQR, respectively) for birth year, as well as bivariate 2×2 tables for the combinations of case–control status and disease-related exposures. This allows us to investigate shared exposures of cases and controls. In addition, the bivariate analyses are stratified by the matching variables and the subcategories of the outcome, that is, NET and NEC. For the stratified analyses, ORs will be calculated according to Mantel and Haenszel.27
Conditional logistic regression models will be estimated separately for BPS and GEPS to obtain estimates of ORs (and 95% CIs) for multiple exposures. All models will adjust for birth year (5-year intervals), sex and time of diagnosis (calendar quarter). After stepwise inclusion of exposure variables, interaction terms will be added to the models to investigate effect modification, for instance, between depression and the metabolic syndrome. Models will be compared based on Akaike information criterion and validated by examining their residual plots. The assumption of linearity in the predictors is assessed using additive models with P-splines.28 All models will also be stratified by the NEN subcategories NET and NEC, and all analyses for GEPS will additionally be stratified by tumour site, stage and grade.
Several sensitivity analyses will be performed. To address the potential of reverse causality, the first sensitivity analysis will measure disease-related exposures based on the fourth and third year preceding the quarter of the NEN diagnosis so that the eight quarters before the NEN diagnosis are excluded. The second sensitivity analysis refers to the study period, which partially coincides with the COVID-19 pandemic that disrupted healthcare utilisation and diagnosis patterns29 30 and may have led to an underdiagnosis of both outcomes and disease-related exposures. To control for this potential confounding, we will add healthcare utilisation frequency to the regression models. In a third sensitivity analysis, rurality of the residence district and area deprivation of the residence municipality will be added to the regression models. Rurality of the residence district may be associated with the likelihood of receiving an assured diagnosis of disease-related exposures as well as outcome measures. Area deprivation of the residence municipality may be linked to patterns of disease-related exposures and, thus, influence the outcome measures.
Missing values may occur in the variables stage and grade of cases. If the number of missing values exceeds an acceptable threshold, multiple imputation (using multiple imputation by chained equation) will be applied.31 All variables, including the matching variables, will be incorporated into the imputation model.
Patients and/or the public were not involved in the design, or conduct or reporting or dissemination plans of this research.
This large, population-based case–control study fully exploits the potential of linking cancer registry with outpatient data to investigate the risk factors for NEN, which is a rare disease as of now, and to help understand their increase in Germany. Thus, the study will add to previous studies from other countries, of which many were suffering from small study size, had a hospital-based study design, analysed only selected tumour sites and did not include information on stage and grade.2 The study will also investigate exposures, such as depression and metabolic syndrome, which have not been extensively studied so far.2 10 32 An additional advantage of the study design is that assessment of exposures does not rely on self-reported exposures but is based on assured outpatient diagnoses in the 4 years preceding the NEN diagnosis, and thus are not subject to recall bias nor subjective perception. With regard to the outcomes, the study will, unlike previous studies, additionally stratify for NET and NEC based on high-quality cancer registry data, allowing for the analysis of potential differences in risk factors between these two entities.
Limitations refer to the observation of specific causal pathways as well as the measurement of disease-related exposures as the outpatient data do not include information regarding the disease onset and as ICD-10 codes lack diagnostic thresholds. To our knowledge, however, this is the best currently available data in Germany to study such a large and unselected population. The drawback is the lack of detailed information to observe specific causal pathways. Regarding the measurement of disease-related exposures, we will use assured ICD-10 diagnoses, that is, diagnoses recorded in at least 2 out of 4 consecutive quarters, to limit misclassification. Moreover, there is a small potential of a collider bias due to restricting the analysis to persons with at least one outpatient physician contact within in the 12 quarters of 2021 to 2023. This effect, however, is expected to be minimal, as in 2021 alone 94% of all GKV insured persons had at least one contact with a general practitioner,26 other specialisations and the years 2022 and 2023 not yet included.
The results of this study should provide risk ratios for potential risk factors of NEN and, thus, help understand the recent NEN increase. The findings of the study may provide valuable insights for government policy on potential preventive measures, while also initiating further research. The study design may also serve as a flagship example of how the linkage of health data of different data sources can yield substantial epidemiological insights, especially in the case of rare diseases.
This study protocol was approved by the Ethics Committee of the Bavarian State Chamber of Physicians (reference number: 24008). Approval by the supervisory authority has been obtained from the Bavarian State Ministry of Health, Care, and Prevention (reference number: G35h-A1080-2023/20-2) and also the Bavarian Data Protection Commissioner stated to have no concerns after presentation of the study protocol (reference number: DSB/7-692/1-275). This study is based on registry and claims data, which are collected on a legal basis without the explicit consent of the patients and which can be used for research purposes by the registry and, under certain conditions, third parties. Patient consent for a specific study is only required for the use or linkage of plain data, but not for the study protocol presented, which is based on an anonymised data set that does not contain any personal data. In accordance with point (b) of Article 14(5) of the European Union General Data Protection Regulation, it is not necessary to inform the patients in this case. The study will be conducted in accordance with the Helsinki Declaration of the World Medical Association as well as the guidelines and recommendations for ensuring good epidemiological practice.33
The data that support the findings of this study will not be publicly accessible because the study partners, that is, the Bavarian Cancer Registry and the Bavarian Association of Statutory Health Insurance Accredited Physicians, are subject to strict legal regulations regarding the disclosure of data. On reasonable request, however, the permissibility of the data provision will be reviewed by the Bavarian Cancer Registry and the Bavarian Association of Statutory Health Insurance Accredited Physicians in accordance with the applicable legal requirements.
The results of the case–control study will be presented at national and international conferences. After final analysis, the results will be published in the form of scientific articles in peer-reviewed journals. In addition, the authors will seek opportunities to share the findings with relevant stakeholders, such as clinicians in cancer centres, and the wider public by using, for instance, newsletters, press releases and social media platforms.
The study starts 15 May 2025 with the recruitment, that is, data extraction and record linkage process of registry and claims data. The recruitment should be completed by 31 August 2025.
Data sharing not applicable as no datasets generated and/or analysed for this study.
Not applicable.
We are grateful to the Trust Centre of the Bavarian Cancer Registry (Dr Jana Johne, Mr Stefan Möllenkamp) for substantial support regarding ethical approval and regulatory approval. Moreover, we want to thank the Cancer Registry of North Rhine-Westphalia (Prof Stang) for commenting on an earlier version of the study protocol.