An integrated method for detecting financial statement fraud

Chandan Goyal¹*, Dr. Bharat Khurana²

1 Student, Bachelor of Commerce, Department of Commerce, Zakir Husain Delhi College, University of Delhi, New Delhi

chandan.goyalstd@gmail.com

2 Professor, Department of Commerce, Zakir Husain Delhi College, University of Delhi, New Delhi

Abstract: Corporate transparency, investor trust, and the integrity of capital markets are all jeopardised by financial statement fraud, which calls for better detection methods. By merging narrative disclosure-based behavioural signals with standardised financial indicators and governance monitoring factors, this study suggests an integrated strategy for identifying financial statement fraud. The analysis begins by standardising all variables using z-scores to guarantee comparability across scales. The variables in the dataset are company-year observations that have been labelled as High Risk or Low Risk based on composite scoring. There is proper normalisation of the financial and governance variables, according to descriptive statistics; nonetheless, early comparisons show that High Risk observations show larger financial pressure signals and lower governance monitoring. Financial pressure has a modest association with governance opportunity and narrative behaviour indicators, and a significant association with financial pressure, according to correlation analysis, demonstrating that fraud risk is multifaceted. Highlighting accruals as a key predictor of high-risk categorisation, interpretable insights are provided by logistic regression utilising just underlying z-score variables. Stratified training and holdout testing are used to incorporate several machine learning models, such as SVM, Random Forest, and Decision Tree, in order to further increase prediction performance. Including designed sub-scores Financial Pressure Score (FPS), Governance Opportunity Score (GOS), and Narrative Behaviour Score (NBS) with the z-score indications gives SVM and Random Forest the best accuracy, according to the results. The results show that auditors, regulators, and forensic practitioners may greatly benefit from an integrated, multi-layered approach to fraud detection since it increases accuracy and dependability.

Keywords: Fraud Detection, Z-Score Indicators, Governance Monitoring, Narrative Behavior, Machine Learning, Fraud Risk Scoring, SVM, Random Forest

INTRODUCTION

One of the most important problems facing modern financial systems is financial statement fraud, which has been seriously undermining market efficiency, shareholder confidence, and the reliability of company reporting. The intentional manipulation, misrepresentation, or omission of financial data with the intention of misleading the company's stakeholders—including investors, regulators, creditors, and auditors—is known as financial statement fraud (Beneish, M. D., 2017, 57–82). Due to the intense competition and globalisation of today's economy, businesses are always under pressure to satisfy their financial obligations, turn a profit, raise the value of their stock, and maintain a favourable reputation in the marketplace. These include the pressure often imposed by investors, legislation, performance requirements, and executive compensation schemes that are dependent on financial success. As a result, unethical practices including earnings management, liability blocking, revenue recognition, and misrepresenting disclosures to reflect an artificially improved financial position can be exploited by managers and corporate executives. In addition to deceiving stakeholders, these actions have a significant negative economic impact on investor wealth, financial market systemic instability, and even business bankruptcies. The complexity of corporate transactions in the modern world, together with the evolution of financial instruments and accounting practices, have made frauds more difficult to track down. However, some latent manipulation tendencies or the creation of fraud risks may not be detected in time by the standard auditing processes and regulatory systems, which might be crucial, because they rely on sampling techniques, rule-related assessments, and periodic reviews (Chen, J., 2018, 1009–1028). Numerous high-profile corporate scandals that have been revealed globally have demonstrated the shortcomings of conventional fraud detection techniques as well as the system's inefficiencies in terms of governance, monitoring, and analysis. These failures highlight the urgent need for more sophisticated, proactive, and data-driven systems that may detect financial misconduct early on before it becomes a major catastrophe.

Combining statistics and machine learning techniques has drawn a lot of attention as a potential solution to improve fraud detection. Compared to conventional approaches, these techniques can analyse enormous volumes of organised and unstructured data, identify intricate correlations between variables, and generate more accurate prediction insights (Dechow, P. M., 2017, 881–915). Financial indicators such as accruals, leverage ratios, and profitability metrics have been widely used to detect anomalies, and governance-related factors such as board independence, audit quality, and ownership structure have been widely used to provide an understanding of the organisational environment that can either encourage or prevent fraudulent activity. Furthermore, abundant information regarding behavioural facts that form indicators of fraud risk can be found in narrative disclosures in annual reports, management discussions and analyses, and other company communications. Due to the intricacy of financial statement fraud, academics and practitioners are increasingly realising that no one approach would be sufficient to handle all aspect of fraudulent conduct. Instead, to improve detection efficacy, a comprehensive approach that integrates financial, governance, and behavioural indicators is required. In order to ensure that variables of varying lengths will be compared and that the data may have a more robust statistical analysis and model construction, standardisation techniques such as z-score normalisation are crucial. Furthermore, machine learning models like Support Vector Machines (SVM), Random Forest, and Decision Trees enhance the predictive power of the model by addressing nonlinear relationships and interactions of variables, while classification models like logistic regression can provide comprehensible information on the significance of particular predictors.

This study contributes to the body of knowledge already in existence by proposing an integrated framework for financial statement fraud detection, which involves several analytical dimensions. In the suggested manner, a comprehensive picture of the risk of fraud will be provided by the combination of standardised financial metrics, indications of changeable gauges in governance monitoring, and narrative-based indicators of behavioural signals (Dong, W., 2020, 113–123). In order to ensure the dependability and generalisability of the results, this study also employs a well-structured research process that comprises data normalisation, a composite risk score, and model validation utilising stratified training and testing data. It is anticipated that the current study's findings will help auditors, regulators, forensic accountants, and company management in general by enabling the early identification of issue companies and improving decision-making processes. In the end, putting these kinds of integrated and data-driven solutions into practice can improve corporate governance, boost transparency, and win back the trust of financial reporting systems, all of which will support the stability and long-term viability of the world's financial markets.

Financial Statement Fraud and Its Determinants

Financial statement fraud is a deliberate crime that involves attempting to present an inaccurate image of a company's performance and financial situation. It is often driven by a variety of factors that are frequently explained by the fraud triangle hypothesis, which includes pressure, opportunity, and rationalisation. Financial pressure can arise from bad performance, debt obligations, or fierce competition, which forces management to manipulate profits to meet expectations. Inadequate governance frameworks, lax internal controls, and a lack of regulatory supervision create opportunities for fraudulent activity to occur and go unnoticed (Gepp, A., 2018, 102–115). On the other hand, rationalisation is the defence of an immoral behaviour by those who believe it is harmless or that they are doing morally. High incentives in executive positions, inappropriate corporate governance, a lack of transparency, and complicated financial reporting are some of the elements that contribute to financial statement fraud (Kotsiantis, S., 2018, 326–336). Additionally, off-balance-sheet transactions and sophisticated accounting techniques have been used more often, and it has been simple to conceal fraudulent activity. Since the stakeholders would be able to identify the risk factors and take preventive action, understanding these variables will be essential to provide efficient detection tools. Organisations can improve internal controls and reduce the likelihood of false financial reporting by addressing these underlying reasons (Kukreja, G., 2020, 773–784).

Need for an Integrated Fraud Detection Approach

Due to the complexity and growing prevalence of financial statement fraud, detection techniques that go beyond traditional auditing techniques must be used. The conventional methods mostly rely on financial ratio analysis and manual analysis, which are not always enough to spot hidden patterns or fraud risks (Li, Y., 2021, 145–160). The emergence of fraud schemes necessitates the use of advanced analytical tools that can analyse large amounts of data, identify irregularities, and provide accurate hypothetical answers. In the past, the method was an integrated process that combined several aspects of analysis, including financial measures, governance issues, and behavioural indications derived from narrative disclosures (Perols, J., 2018, 1–20). While governance variables provide information on the effectiveness of an organization's monitoring procedures, financial indicators help detect anomalies in accounting figures. On the other hand, narrative analysis is used to examine the foundation of qualitative disclosures in order to spot discrepancies or misleading messages that could point to fraud. The consistency and comparability of variables are ensured by statistical standardisation techniques (such as the z-score normalisation), which improves analytical accuracy (Ravisankar, P., 2017, 491–500). Support Vector Machines, Random Forest, and Decision Trees are examples of machine learning models that are used to find complicated and nonlinear relationships between variables that more traditional approaches could overlook. Organisations may create more effective and efficient fraud detection systems by combining these techniques, which eventually results in stakeholder credibility, accountability, and transparency.

Objectives of the Study

To develop an integrated framework for detecting financial statement fraud using financial, governance, and behavioral indicators.
To analyze the effectiveness of z-score standardization in improving data comparability and model performance.
To evaluate the predictive accuracy of statistical and machine learning models such as Logistic Regression, SVM, Random Forest, and Decision Tree.
To identify key variables that significantly contribute to fraud risk classification and provide actionable insights for stakeholders.

Limitations of the Study and Managerial Implications

Limitations of the Study:

There are certain limitations to this research. First, secondary data—which may be skewed or erroneous—is used for the study. Second, information on governance and narrative disclosure may not always be readily available in the companies and over time. Third, despite their power, machine learning models can overfit or become incomprehensible under some circumstances. Additionally, because various businesses and regions may have different regulatory frameworks and reporting traditions, the results may not be entirely applicable to them.

Managerial Implications:

Despite these drawbacks, managers, auditors, and regulators can benefit from the research. In order to successfully manage risks, the integrated fraud detection system may assist organisations in identifying any early indicators of financial transaction fraud. The results can help managers advance corporate governance procedures, internal controls, and more transparent financial reporting. Auditors and forensic experts might utilise the suggested models to highlight high-risk instances and allocate resources more efficiently. In general, the research supports the necessity of using data-driven decision making strategies to reduce fraud risk and guarantee moral corporate conduct.

REVIEW OF LITERATURE

Sodnomdavaa, T. (2025) The use of financial ratios and mathematical models to identify irregularities in corporate reporting has been extensively covered in previous research on financial statement fraud detection. The first ones emphasised ratio analysis as a key tool for spotting anomalies in profitability, liquidity, and leverage, suggesting that unusual shifts in the metrics may point to potential manipulation. Researchers found that the M-score and Z-score Beneish models are useful for predicting financial crises and possible profits management. However, the approaches typically rely heavily on historical financial data and may not take into account the qualitative aspects of fraud. Financial indicators are important indicators, but because fraud schemes are dynamic, they cannot be utilised on their own, according to subsequent study. As a result, academics began advocating for the inclusion of characteristics other than financial data. According to this research, in order to improve detection accuracy and reduce false positives in the fraud detection system, it is necessary to take into account both conventional and more advanced financial analysis techniques.

Haq, M. A. (2024) The effectiveness of corporate governance in preventing and identifying financial statement fraud has been extensively studied in the literature. According to research, the most crucial factors in reducing the likelihood of fraud are sound governance structures including an independent board, a strong audit committee, and transparent ownership. Rather, inadequate governance contributes to creating an environment where managerial opportunism may flourish. Empirical data shows that companies with lax internal and regulating procedures are more likely to deal with fraudulent records and profits manipulation. Furthermore, it has been discovered that external auditors and institutional investors enhance accountability and oversight. However, a number of studies demonstrate that the governance mechanisms are insufficient to completely eliminate the risk of fraud since sophisticated managers may still exploit systemic flaws. As a result, researchers have presented hybrid models that incorporate both financial measurements and governance indicators. The research generally demonstrates that governance variables are important components in the assessment of fraud risk and must be included in the overall framework for fraud detection.

Al-Shammari, M. (2024) The study on fraud detection has been greatly impacted by the recent advancements in machine learning and data analytics. Scholars have investigated the classification of fraudulent and non-fraudulent financial statements using algorithms such as Decision Trees, Support Vector Machines (SVM), Artificial Neural Networks, and Random Forest. Because these models can handle big datasets and identify complicated, nonlinear correlations between variables, they are typically more successful than traditional statistical methods. Studies comparing machine learning methods to logistic regression have demonstrated that ensemble methods, such Random Forest, are often superior and more potent. However, problems with model interpretability and transparency are still perceived as difficult, particularly when it comes to regulatory and auditing matters where explainability is crucial. To achieve an equivalent level of accuracy and transparency, a potential solution to this issue is to strike a balance between interpretable models and high-performance algorithms. The literature suggests that machine learning can revolutionise fraud detection, especially when paired with domain expertise and traditional analytical techniques.

Elshafie, H. (2022) The analysis of narrative disclosures and textual data to detect financial fraud is another important area of study. Scholars are now looking at the linguistic and behavioural effects of scams as unstructured data continues to be accessible in annual reports, management discussions, and corporate communications. According to research, dishonest businesses frequently use ambiguous language, overstated potential, and complex terminology when reporting bad news. To extract useful information from textual data, sentiment analysis, readability metrics, and keyword frequency analysis have been employed. These techniques provide an additional perspective on the quantitative financial analysis of managers' intentions and behavioural patterns. However, in addition to the consistency of different reporting formats, there are also problems with text data standardisation. Despite these drawbacks, adding narrative analysis to fraud detection models has shown positive outcomes in terms of improving the fineness of identifying attempts to alter numerical data that would otherwise be hard to identify in numerical data.

Zhao, Q., Lai, D., (2022) The most recent literature has emphasised the need for integrated and multifaceted fraud detection techniques. According to scientists, financial statement fraud is a complex phenomena that is influenced by a combination of behavioural proclivities, governance failures, and financial hardship. To improve exceptional prediction performance, research has proposed integrated frameworks that include financial ratios, governance metrics, and textual data. It has been emphasised that data standardisation techniques, such as z-score normalisation, are crucial for ensuring that comparable variables are comparable. Empirical data indicates that integrated models are more accurate, dependable, and robust than single-method methods. The use of hybrid approaches to combine statistical models with machine learning algorithms is another recent development. These techniques not only enhance detection capacities but also provide auditors and regulators with practical advice. Overall, the body of research strongly supports the use of integrated frameworks as a more effective way to address the complexity of financial statement fraud in the modern company environment.

METHODOLOGY

Research Design and Analytical Framework

An integrated strategy for identifying financial statement fraud is developed and evaluated in this work using a quantitative, multi-stage analytical design. An all-encompassing fraud risk model is included into the study framework by integrating narrative disclosure measures, governance monitoring indicators, and conventional financial ratios. Descriptive analytics, correlation testing, logistic regression for interpretability, and machine learning classification for predictive evaluation follow a sequential framework in the design that begins with data pretreatment and standardisation. The Financial Pressure Score (FPS), Governance Opportunity Score (GOS), and Narrative Behaviour Score (NBS) are the three main components of a composite fraud risk scoring system that the study uses to determine which indicators are high and low risk. The efficacy of the integrated model may be gauged from these scores, which reflect the many factors that contribute to fraud risk. A comprehensive evaluation of the explanatory and predictive components of fraud detection is guaranteed by the overall architecture.

Data Collection, Variables, and Standardization

Dataset entries for each years have been labelled as either "High Risk" or "Low Risk" to facilitate comparisons. To maintain uniformity among scale-dependent variables, all governance and financial metrics were converted to standardised z-scores. According to descriptive statistics, the normalisation was successful in making the variables more comparable by bringing their means near to zero and their standard deviations close to one. Accrual levels, audit independence metrics, board independence measures, promoter shareholding patterns, liquidity measures (such as the Current Ratio), leverage indicators (such as the Debt-to-Equity Ratio), and profitability ratios are all important variables. Both the financial performance pressure and the level of governance are important factors that determine the likelihood of fraud, and these variables represent both of them. In addition, narrative indicators obtained from textual analysis of annual report disclosures were used to provide a behavioural dimension to the dataset. The theoretical frameworks of the fraud triangle and fraud diamond, which highlight the monetary, administrative, and behavioural aspects of deceit, are in line with this organised set of factors.

Correlation and Multicollinearity Assessment

We used correlation analysis to look for patterns in the correlations between the FRS, the underlying z-score variables, and the FPS, GOS, and NBS component scores before we built our prediction models. By doing so, we could verify the degree of multicollinearity and see if the variables acted in accordance with our theoretical expectations. It was confirmed that each dimension represents different risk features by the results, which showed that FFR had a large association with financial pressure, a moderate correlation with governance opportunity and narrative behaviour, and weak correlations among the subcomponent scores. These variables might be used in logistic regression and machine learning models without major multicollinearity difficulties due to low-to-moderate correlations across predictors. The ensuing analyses were guaranteed to be statistically sound by this evaluation.

Logistic Regression for Interpretability

To determine whether standardised factors are associated with a company-year observation's High Risk classification, logistic regression was utilised as an interpretable baseline model. The final regression model only used the raw z-score variables since designed sub-scores like FPS, GOS, and NBS produced flawless separation warnings. A 60-observation balanced subset was created, with 30 classified as high risk and 30 as low risk, to eliminate imbalance-related biases. A statistically significant predictor of high-risk categorisation was accrual levels, according to the regression analysis. However, owing to sample limits, the coefficients for profitability and leverage variables should be regarded with care, even if they were directionally meaningful. Providing interpretability and illustrating the underlying contribution of various variables were the primary functions of the logistic regression.

Machine Learning Classification and Model Evaluation

There were three machine learning classifiers trained using a 70/30 stratified train-test split: Decision Tree, Support Vector Machine (SVM with Radial Basis Function (RBF) kernel), and Random Forest. Their prediction performance was then evaluated. One set of features included just standardised z-score predictors, whereas the other set included engineered composite sub-scores, including the Financial Pressure Score (FPS), Governance Oversight Score (GOS), and Narrative Behaviour Score (NBS). Confusion matrices and total accuracy measures were used to evaluate the model's performance. When taking into account the engineering scores, Support Vector Machine and Random Forest both reached the maximum predicted accuracy of 0.9688, according to the data. On the other hand, the Decision Tree classifier performed quite poorly. These results underline the efficacy of a multidimensional framework in boosting fraud detection, and they imply that the suggested integrated method is resilient.

RESULTS

Descriptive Statistics of Standardized Variables

Descriptive statistics show how the fraud-risk metrics were built using standardised financial and governance variables, which have distributional qualities. The effective normalisation across variables with varying scales is confirmed by the fact that all predictors were transformed into z-scores, which result in their means clustering around zero and standard deviations approaching one. For the sake of future comparisons in regression and machine-learning studies, this standardisation was vital. There are no outliers that might skew the model estimates for profitability (Z_ROA, Z_ROE), liquidity (Z_CurrentRatio), leverage (Z_DebtEquity), accrual intensity (Z_Accrual), and governance measures (Z_IndepDir, Z_AuditIndep, Z_Promoter). Notably, compared to Low Risk companies, High Risk ones showed weaker governance and greater average financial pressure signals. These trends are in line with fraud-risk theory, which suggests that low monitoring systems and financial stress can raise the chance of underreporting.

Table 1: Descriptive Statistics of Standardized Variables (Z-Scores)

Variable	Mean	Std Dev	Min	Max
Z_ROA	0.00	0.99	-2.64	4.39
Z_ROE	0.00	0.99	-5.56	3.27
Z_CurrentRatio	0.00	0.99	-1.60	6.20
Z_DebtEquity	0.00	0.99	-1.44	4.09
Z_Accrual	0.00	0.99	-3.28	4.93
Z_IndepDir	0.00	0.99	-1.87	2.65
Z_AuditIndep	0.00	0.99	-1.16	2.07
Z_Promoter	0.06	1.00	-2.64	2.15

Correlation Analysis of Composite and Underlying Indicators

The Final Fraud Risk Score (FRS), the Financial Pressure Score (FPS), the Governance Opportunity Score (GOS), and the Narrative Behaviour Score (NBS) were all part of the composite scores that were examined using correlation analysis. The goal was to evaluate the level of multicollinearity before to modelling and to find out if the variables moved in the predicted directions theoretically. A high positive correlation between the FRS and FPS (r =.851) suggests that financial pressure has a significant impact on risk categorisation. Both GOS (r =.410) and NBS (r =.352) showed somewhat positive correlations, indicating that the elements of narrative behaviour and governance have extra explanatory value. Each of the three variables—FPS, GOS, and NBS—captures a different aspect of fraud risk, and their weaker correlations prove it. This justifies combining them into one model.

Table 2: Correlation Matrix

	FPS	GOS	NBS	FRS
FPS	1	-0.067	0.079	0.851
GOS	-0.067	1	0.095	0.410
NBS	0.079	0.095	1	0.352
FRS	0.851	0.410	0.352	1

Logistic Regression Output for High-Risk Classification

The correlation between standardised predictors and categorisation into High Risk and Low Risk groups was examined using logistic regression, which provided an interpretable baseline model. In order to prevent perfect-separation warnings and circularity from manufactured scores, the final model only incorporated the raw z-score predictors. To ensure transparency and minimise class-imbalance bias, the analysis was performed on a balanced subsample of 60 observations, with 30 being classified as high risk and 30 as low risk. The model's great overall predictive power was indicated by its statistical significance (p <.001). The accruals (Z_Accrual) variable stood out as a strong positive predictor (p =.005) among the predictors, indicating that a higher intensity of accruals greatly raises the probability of being classified as high-risk. A negative coefficient for leverage (Z_DebtEquity) suggested connections with financial-health abnormalities, further demonstrating its relevance.

Table 3: Binary Logistic Regression Results

Predictor	B	SE	Z	p-value	Odds Ratio	95% CI (OR)
Constant	0.081	0.776	0.104	.917	1.084	[0.237, 4.964]
Z_ROA	-4.370	2.779	-1.562	.118	0.013	[0.000, 3.049]
Z_ROE	2.104	3.012	0.699	.485	8.201	[0.022, 3001.030]
Z_DebtEquity	-4.557	2.125	-2.144	.032	0.010	[0.000, 0.676]
Z_Accrual	5.006	1.770	2.829	.005	149.299	[4.653, 4790.283]

Machine Learning Model Accuracy (Z-Only Feature Set)

Using a stratified 70/30 train-test split, machine-learning classifiers were used to evaluate prediction performance. Random Forest and Support Vector Machine (SVM) (RBF) both obtained 0.9375 accuracy with 32 test observations under the Z-only feature set, with just two misclassifications each. The accuracy rate of the Decision Tree model was 0.8125, which is considered moderate. These results show that non-linear classifiers may successfully identify patterns linked to financial statement fraud risk even in the absence of manufactured scores.

Table 4: ML Accuracy: Z-Only Feature Set

Model	Accuracy
SVM (RBF)	0.9375
Random Forest	0.9375
Decision Tree	0.8125

Performance with Integrated Feature Set (Z + FPS/GOS/NBS)

All of the models showed significant performance improvements when constructed composite scores were included in the feature set. SVM and Random Forest, in particular, achieved an accuracy of 0.9688. As shown by confusion matrices, both models displayed remarkable discriminating power by incorrectly classifying a single observation. A considerable improvement was shown by the decision tree's accuracy, which rose to 0.8750. These results back up the idea that combining financial, governance, and narrative aspects makes models more robust and helps to better distinguish between observations with high risk and those with low risk.

Table 5: ML Accuracy: Integrated (Z + FPS/GOS/NBS) Feature Set

Model	Accuracy
SVM (RBF)	0.9688
Random Forest	0.9688
Decision Tree	0.8750

CONCLUSION

In comparison to conventional single-indicator methods, this study's results show that an integrated, multidimensional strategy provides a more solid and trustworthy way to identify financial statement fraud. The suggested methodology outperforms models that depend exclusively on financial or structural data in capturing the complexity of fraud risk. It does this by merging financial pressure indicators with governance opportunity metrics and narrative behaviour signals. Logistic regression findings show that accrual activity and leverage anomalies significantly impact fraud-risk categorisation, while descriptive statistics and correlation analysis validate that each dimension adds distinct explanatory power. It is worth noting that the combined use of engineering risk ratings and standardised predictors is supported by the high performance of machine-learning models, specifically SVM and Random Forest. Predictive accuracy increased significantly with the inclusion of the composite measures, demonstrating the value of integrating many fraud-risk indicators into a single framework for analysis. A multi-layered phenomena influenced by pressures, opportunities, and behavioural intent best describes fraud risk, according to the consistently excellent performance across models. In sum, the research lends credence to the idea that businesses would be wise to implement data-driven, integrated strategies for fraud detection. The results also have real-world implications for auditors, regulators, and analysts in the financial sector who are looking for early warning systems that can spot high-risk firms before fraudulent activities get out of hand. The model's predictive potential and its application across businesses and regulatory environments should be further enhanced with future study using bigger samples and validated fraud instances, while the results are encouraging.

References

Beneish, M. D., Lee, C. M. C., & Nichols, D. C. (2017). Earnings manipulation and expected returns. Financial Analysts Journal, 73(2), 57–82.
Chen, J., Cumming, D., Hou, W., & Lee, E. (2018). Executive integrity, audit opinion, and fraud detection. Journal of Business Ethics, 151(4), 1009–1028.
Dechow, P. M., Ge, W., Larson, C. R., & Sloan, R. G. (2017). Predicting material accounting misstatements. Contemporary Accounting Research, 34(2), 881–915.
Dong, W., Liao, S., & Zhang, Z. (2020). Leveraging financial ratios and machine learning for fraud detection. Expert Systems with Applications, 146, 113–123.
Gepp, A., Linnenluecke, M. K., O’Neill, T. J., & Smith, T. (2018). Big data techniques in auditing research and practice. Journal of Accounting Literature, 40, 102–115.
Kotsiantis, S., Koumanakos, E., Tzelepis, D., & Tampakas, V. (2018). Forecasting fraudulent financial statements using data mining. International Journal of Computational Intelligence Systems, 11(1), 326–336.
Kukreja, G., Gupta, S., Sarea, A., & Kumaraswamy, S. (2020). Beneish M-score and fraud detection: Evidence from emerging economies. Journal of Financial Crime, 27(3), 773–784.
Li, Y., & Liu, C. (2021). Application of machine learning in financial fraud detection: A review. IEEE Access, 9, 145–160.
Perols, J. (2018). Financial statement fraud detection: An analysis of statistical and machine learning algorithms. Auditing: A Journal of Practice & Theory, 37(2), 1–20.
Ravisankar, P., Ravi, V., Rao, G. R., & Bose, I. (2017). Detection of financial statement fraud using data mining techniques. Decision Support Systems, 50(2), 491–500.
Lkhagvadorj, G., & Sodnomdavaa, T. (2025). Financial statement fraud detection through an integrated machine learning and explainable AI framework. Preprints.
Ismail, M. M., & Haq, M. A. (2024). Enhancing enterprise financial fraud detection using machine learning. Engineering, Technology & Applied Science Research, 14(4), 854–861.
Alghasra, A. Y., Almulla, D., Abbas, M., & Al-Shammari, M. (2024). Machine learning for detecting financial statement fraud: A bibliometric analysis. Proceedings of the International Conference on Data Analytics for Business and Industry.
Ali, A., Razak, S. A., & Elshafie, H. (2022). Financial fraud detection based on machine learning: A systematic literature review. Applied Sciences, 12(19), 637.
Zhao, Q., Lai, D., (2022). Financial fraud detection and prediction in listed companies using SMOTE and machine learning algorithms. Mathematics, 10(16).