Predicting Number of Accidents and Black Spot of a Route Using Genetic Algorithm

Improving Traffic Accident Prediction with Genetic Algorithm-based Method

by Miss. Nikita Ingawale*, Dr. R. R. Sorte,

- Published in Journal of Advances in Science and Technology, E-ISSN: 2230-9659

Volume 19, Issue No. 2, Apr 2022, Pages 171 - 126 (10)

Published by: Ignited Minds Journals


ABSTRACT

Because of its importance in saving human lives, traffic accident prediction is a motor vehicle traffic challenge. There are various studies in the literature that use artificial neural networks (ANNs), support vector machines (SVMs), decision trees (DTs), and other categorization approaches to predict the severity of traffic accidents. Indeed, the fundamental shortcoming of ANNs and SVMs is their lack of human interpretation, whereas the main disadvantage of traditional DTs like C4.5, ID3, and CART is their low accuracy. To solve these flaws, we present a Genetic Algorithm-based method to predict traffic accidents based on user preferences instead of traditional DTs in this review.We customised a genetic algorithm, to optimise and find rules based on Support, Confidence, and Comprehensibility metrics in the suggested method. The suggested method's goal is to provide facilities for users, such as traffic cops, road and transportation engineers, to make use of their knowledge while balancing all of the competing objectives. A traffic accident data set of accidents in rural and urban roadways in Tehran Province, Iran, was used to assess the suggested technique during a five-year period (2008–2013). According to the evaluation results, the proposed technique outperforms classification methods such as ANN, SVM, and traditional DTs in terms of classification metrics such as accuracy and rule performance metrics such as support and confidence.

KEYWORD

traffic accident prediction, artificial neural networks, support vector machines, decision trees, genetic algorithm, user preferences, Support, Confidence, and Comprehensibility metrics, traffic cops, road and transportation engineers, accident data set

I. INTRODUCTION

The number of traffic accidents globally is increasing as the world's population expands and cars become more widespread. Improved geometric design, congestion management tactics, and greater driver education and enforcement are all traditional ways to prevent collisions. While these procedures are often beneficial, they are frequently impractical or prohibitively expensive to put in place. Many factors play a role in traffic accidents, and some of them have a significant impact on one another, making it impossible for transportation safety engineers to use just one parameter to adequately explain the severity of traffic accidents. The number of traffic crashes can be reduced by studying parameters involved in traffic crashes utilizing combined contemporary models that include the interplay of input and output variables. Road and traffic accidents are a leading source of death and disability around the world. A collision between a conveyance and another conveyance, a person, or other items is referred to as a road contingency. A road accident not only causes physical damage, but it can also cause partial or complete incapacitation, and in some cases, death. The rising number of traffic accidents is not a healthy indicator for vehicle safety. The only solution is to analyze traffic congestion data in order to discover various causes of road accidents and take preventative actions. One of the most important tools for analyzing the relationship between crash incidence and risk factors associated with various traffic entities is the crash prediction model (also known as the safety performance function). Every year, more than 28000 people are killed on Iranian roadways, resulting in severe economic and social effects. The demographic or behavioral characteristics of the driver (vehicle speed, driver's age and gender, seat belt use), environmental factors and roadway conditions at the time of the crash (crash time, weather conditions, road surface, crash type, collision type, traffic flow), and technical characteristics of the vehicle itself (vehicle type and safety) all have a significant impact on traffic accident severity. and unconstrained optimization problems. The third model we look at is one that combines genetic algorithm (GA) and other models. Because the use of GA models in transportation safety research is still relatively new, we'll combine these models to improve forecast accuracy. Because the incidence of accidents on a highway segment might be considered a random event, previous research examining accident frequency has primarily relied on statistical methods such as linear regression models, Poisson regression models, and/or negative binomial regression models.

Genetic algorithm

GAs was invented by John Holland in 1970s. There are several terminologies in GA, which are clarified in the following. A chromosome is a set of genes and shows a solution to the particular problem. Each gene presents a particular property of the solution. The set of chromosomes at a given time is called a population. An evolutionary process is executed iteratively, initialized by a randomly chosen population. Each iteration is called generation. At each generation, such evolution is achieved by applying mutation and crossover operators between different chromosomes. The next generation of population includes the fittest parents or children chromosomes, and is done via a replacement scheme. Finally, the evolutionary process terminates when a population satisfies several predetermined conditions. There are several variations of this overall evolutionary process. Therefore, to use a GA, we need an encoding of the solution, i.e. representation of the solution as a chromosome, an initial population, mutation and crossover operators, a fitness function along with selection and replacement schemes. Algorithm 1 gives the pseudo code for a GA.

On the Road Safety

The availability of transportation is inextricably linked to a country's economic development, and the goal of transportation systems is to make moving freight and passengers from one area to another as efficient and safe as possible. Traffic accidents, which resulted in the loss of lives and property, had become a severe social concern as the number of automobiles on the road increased. According to various accident studies, road accidents are caused by negligence and a lack of road safety standards, rather than natural reasons. Environmental conditions, such as fog in the winter, play a role in the causes of traffic accidents. As a result, road/highway safety has become a need in today's world. A genetic algorithm is based on natural selection, the mechanism that drives biological evolution, and it can solve both limited and unconstrained optimization problems. A population of individual solutions is repeatedly modified by genetic algorithms. At each phase, the genetic algorithm chooses parents at random from the current population and uses them to produce the following generation's children. The population 'evolves' toward an optimal solution over generations. Genetic algorithms can be used to tackle a range of optimization problems that aren't well-suited to traditional methods, such as issues with discontinuous, non-differentiable, stochastic, or highly nonlinear objective functions. Holland (1992) invented this method in the 1960s and 1970s, and it was popularized by one of his students, Goldberg, who used it to solve a challenging problem for his dissertation concerning gas-pipeline transmission management (Goldberg 1989). Holland's schema theorem was the first attempt to build a theoretical foundation for Gas. De Jong's (1975) work was the first systematic effort to improve GA parameters and established the utility of GAs for function optimization. Crossover (exchanging randomly chosen slices of a chromosome) and mutation (changes in a randomly chosen piece of a chromosome) are GA operators. Figure 1 depicts the GA's genetic cycle, in which the best individuals are continually picked and modified through crossover and mutation.

Figurw1.1: The general structure of Genetic algorithm

III. PROBLEM STATEMENT

The problem at city by pass Highway is due to insufficient Highway space for the vehicles to pass through the junction at different times of the day, which is affecting the free flow of traffic, and improper movement of traffic also results in the occurrence of accident at different times of the day, according to different examinations such as Highway survey and traffic analysis."

IV. AIM OF PROJECT

The purpose of this research is to discover the key factors that determine the occurrence of accidents

V. OBJECTIVES

The goal of this work is to offer a review of the state of the art in using genetic algorithms and advanced data analysis techniques to anticipate road accidents. To collect data on age, gender, vehicle, time, place, month, and other factors in order to better understand the black spot in a stretch. Convert this data to binary so that it can be used in a Genetic Algorithm. Then we use a statistical model to compare the best and worst solutions in order to get the best solution by selecting the fittest parent, cross covering to obtain its mutation, and generating its offspring. Use this data to determine if the GA result corresponds to the actual data. And then put it into practice.

VI. RESEARCH METHODOLOGY

• Examine and classify the most often used data sources for analysis and predictive research. • Detection of traffic events in real time • Describe the steps used to conduct out data analysis on road accidents using genetic algorithms, including their strengths and limits. • Using GA to predict road accidents. • Comparing real-time and GA data to determine if they may be used to anticipate accidents.

Figure1.2: Methodology Flow What is GA?

John Holland invented GAs in the 1970s. In GA, there are various terms that are explained in the following sections. A chromosome is a collection of genes that represents a solution to a specific problem. Each gene represents a unique characteristic of the solution. A population is a collection of chromosomes at a specific point in time. An evolutionary process is repeated iteratively, with a randomly chosen population as the starting point; each iteration is referred to as a generation. The application of mutation and crossover operators between separate chromosomes is used to achieve such evolution at each generation. The fittest parents or children's chromosomes are used to create the next generation of population, which is done through a replacement strategy. Finally, the evolutionary process comes to an end when a population meets a set of preset criteria.

Basic Terminology

Before diving into the topic of Genetic Algorithms, it's important to understand some basic terms that will be utilised throughout this course. Population is a subset of all conceivable (encoded) solutions to a problem. The population of a GA is similar to that of human beings; only we have Candidate Solutions representing human beings instead of human beings. Chromosomes A chromosome is an example of a solution to a problem. Gene A gene is one of the chromosome's element positions. Allele is the value that a gene has for a specific chromosome.

Figure 1.3: Basic Terminologies

machine can easily understand and manipulate. Phenotype: The population in the actual real-world solution space in which solutions are represented as they are in real-world settings. Phenotype and genotype spaces are the same for simple problems. The phenotypic and genotype spaces, on the other hand, are usually different in most circumstances. Decoding converts a solution from genotype to phenotype space, whereas encoding converts from phenotype to genotype space. Consider the 0/1 Knapsack Problem, for example. The Phenotype space is made up of solutions that only have the item numbers of the items to be chosen in them. However, it can be expressed as a binary string of length n in genotype space (where n is the number of items). A 0 at location x indicates that the Xth item is selected, while a 1 indicates the opposite. In this scenario, the genotype and phenotypic spaces are not the same. Fitness Function: A fitness function is a function that takes a solution as an input and returns the solution's suitability as an output. The fitness function and the objective function may be the same in certain circumstances, but they may be different in others depending on the challenge. Genetic Operators: These change the offspring's genetic makeup. Crossover, mutation, and selection are examples of these. The Foundation: A GA's basic structure is as follows: We begin by selecting parents for mating from an initial population (which may be generated at random or seeded by other heuristics). To create new offspring, use crossover and mutation operators on the parents. Finally, these offspring replace the population's current individuals, and the cycle resumes. In this approach, genetic algorithms attempt to approximate human evolution to a degree.

Figure1.4: GA Flow Chart ► Population Initialization

There are two primary methods to initialize a population in a GA. They are – • Random Initialization − Populate the initial population with completely random solutions. • Heuristic initialization − Populate the initial population using a known heuristic for the problem. It has been observed that using a heuristic to initialize the entire population can result in the population having similar solutions and very little diversity. It has been observed experimentally that random solutions are the ones that drive the population to optimality. As a result, with heuristic initialization, we simply seed the population with a few good solutions while filling the rest with random solutions, rather than filling the entire population with heuristic-based solutions. It has also been observed that heuristic initialization only affects the population's initial fitness in some cases, but in the end, it is the diversity of solutions that leads to optimality.

► Fitness Function

Simply put, the fitness function is a function that takes a candidate solution to a problem as input and outputs how "fit" or "good" the solution is with respect to the problem under consideration. Because the calculation of fitness value is repeated in a GA, it should be sufficiently fast. A slow computation of the fitness value can have a

Most of the time, the fitness function and the objective function are the same, because the goal is to maximize or minimize the given objective function. An Algorithm Designer may choose a different fitness function for more complex problems with multiple objectives and constraints. The following characteristics should be present in a fitness function: The fitness function must be computed quickly, and it must quantitatively measure how fit a given solution is or how fit individuals can be produced from the given solution. Because of the inherent complexities of the problem at hand, calculating the fitness function directly may be impossible in some cases. In such cases, we perform fitness approximation to meet our requirements.

► Selection

The process of selecting parents who will mate and recombine to produce offspring for the next generation is known as parent selection. Parental selection is critical to the GA's convergence rate because good parents drive individuals to better and fitter solutions. However, caution should be exercised to avoid one extremely fit solution taking over the entire population in a few generations, as these results in the solutions being close to one another in the solution space, resulting in a loss of diversity. Maintaining a high level of diversity in the population is critical to the success of a GA. Premature convergence is the taking up of the entire population by one extremely fit solution, and it is an undesirable condition in a GA.

► Crossover

In this chapter, we'll go over what a Crossover Operator is, as well as its other modules, as well as their uses and benefits. The crossover operator is analogous to biological crossover and reproduction. More than one parent is chosen in this case, and one or more offspring are produced using the genetic material of the parents. Crossover is typically used in a high-probability GA.

► Genetic Algorithm Mutations

In simplified terms, mutation is a small random change in the chromosome that results in a new solution. It is typically used with a low probability – pm to maintain and introduce diversity in the genetic population. The GA is reduced to a random search if the probability is very high. to converge, whereas crossover is not.

Operators of Mutation

In this section, we will go over some of the most common mutation operators. This is not an exhaustive list, and the GA designer may find that a combination of these approaches or a problem-specific mutation operator is more useful.

► Survivor Selection Genetic Algorithms

The Survivor Selection Policy determines who will be kicked out and who will be kept in the next generation. It is critical because it ensures that the fitter individuals are not pushed out of the population while also maintaining population diversity. Elitism is used by some GAs. Simply put, it means that the current fittest member of the population is always passed down to the next generation. As a result, the fittest member of the current population can never be replaced. The simplest policy is to exclude random members of the population, but this approach frequently leads to convergence issues, so the following strategies are commonly used.

► Condition of Termination

A Genetic Algorithm's termination condition is critical in determining when a GA run will end. It has been observed that the GA progresses very quickly at first, with better solutions arriving every little iteration, but this tends to saturate in the later stages, where the improvements are very small. Typically, we want a termination condition that ensures our solution is close to optimal at the end of the run. • When there has been no improvement in the population for X iterations. • When we reach a predetermined number of generations. • When the objective function value reaches a predefined threshold.

Genetic Algorithms - Application Areas

Genetic Algorithms are primarily used in optimization problems of various kinds, but they are frequently used in other application areas as well. In this section, we list some of the areas in which Genetic Algorithms are frequently used. These are – given objective function value under a given set of constraints. The approach to solve Optimization problems has been highlighted throughout the tutorial. • Economics − GAs are also used to characterize various economic models like the cobweb model, game theory equilibrium resolution, asset pricing, etc. • Neural Networks − GAs are also used to train neural networks, particularly recurrent neural networks. • Parallelization − GAs also have very good parallel capabilities, and prove to be very effective means in solving certain problems, and also provide a good area for research. • Image Processing − GAs are used for various digital image processing (DIP) tasks as well like dense pixel matching. • Vehicle routing problems − With multiple soft time windows, multiple depots and a heterogeneous fleet. • Scheduling applications − GAs are used to solve various scheduling problems as well, particularly the time tabling problem. • Machine Learning − as already discussed, genetics based machine learning (GBML) is a niche area in machine learning. • Robot Trajectory Generation − GAs have been used to plan the path which a robot arm takes by moving from one point to another. • Parametric Design of Aircraft − GAs have been used to design aircrafts by varying the parameters and evolving better solutions. • DNA Analysis − GAs have been used to determine the structure of DNA using spectrometric data about the sample. • Multimodal Optimization − GAs are obviously very good approaches for multimodal optimization in which we have to find multiple optimum solutions. • Traveling salesman problem and its applications − GAs have been used to solve the TSP, which is a well-known combinatorial problem using novel crossover and packing strategies The Pune-Bangalore Expressway (MPEW) is India's first concrete toll road with four lanes. It runs for 94.5 kilometres between Pune, Maharashtra's capital, and Pune, the state capital. Pune, the cultural and intellectual capital of Maharashtra, is India's financial capital. It was fully operationalized in the year 200. Due to human error and the high volume of traffic, MPEW has seen a high incidence of road accidents. The Highway Police have reported 872 accidents in the last three years. There were 232 fatal accidents, with 268 people killed, 392 people badly hurt, and 67 people suffering minor injuries.

Figure 1.5: Map Showing Pune - Bangalore Expressway Accidents by Vehicle Speed Distribution:

Fig 1.6: Accidents by Vehicle Speed Distribution:

Figure 1.6 depicts the accident distribution at various vehicle speeds. In fig.1.6, it was discovered that 61 percent of incidents happened at speeds of 90 km/hr, whereas only 13 percent occurred at speeds of 60 km/hr.

Distribution of accidents by Weather Conditions:

Fig 1.7 Distribution of accidents by Weather Conditions

Figure 1.6 depicts the accident distribution under various weather situations. It was shown that 84 percent of accidents happened in calm weather with no strong gusts. Only 11% of incidents occurred while it was raining and there were no strong gusts, and the other percentages are indicated in fig.1.7.

Distribution of accidents by Light Conditions:

Fig 1.8: Distribution of accidents by Light Condition

Figure 1.8 depicts the accident distribution for various lighting conditions. It was discovered that 71 percent of accidents occurred in daylight, while 21 percent occurred in darkness with lights lit turned on and illuminated light condition, as well as other percentages of light conditions given in fig.1.8

Distribution of accidents by Light Chainage(km): Fig 1.9 Distribution of accidents by Light Chainage (km):

RESULTS

The web app's home page is depicted in the image above. For safe data transfer and to utilise the Geolocation API, the web domain is secured with HTTPS, which was obtained from the certificate authority. The input for parameters taken from users is shown in the diagram. The vehicle_speed, weather conditions, road surface condition, and speed restriction are all factors to consider. The web page requests the browser to take user coordinates when the user clicks the update coordinates button. The GeoLocation API is utilised in the backend Streamlit module to get the user's location. The user's data is displayed in the diagram above. When the user selects Predict, the data is transferred to the backend, where it is fed into the Genetic Technique, our selected machine learning algorithm. The production is predicted based on the number of accidents.

CONCLUSION

Based on input data and injury severity levels, this GA study will be useful in predicting the No of accidents. This data will be combined with data gathered along a specific route to determine the no of the accident. To see if GA outperforms other methods for calculating accidents. The elements that cause accidents have been researched in this work in order to provide road safety, and accident prediction models that contain relationships between these components have been constructed. The database was created using the years 2017 for the geometrical aspects of highway sections and traffic accident records. The use of Genetic Algorithm as a tool for predicting techniques was studied in this work. The comparison of model prediction performance between the target and output is assessed to determine the performance of the Genetic model.

ACKNOWLEDGMENT

We express our sincere thanks to Project Guide Dr. R.R. Sorte for his continuous support. We also thankful to our Head of Department of civil Dr. R.R. Sorte For support

REFERENCES

[1] Seyed Hessam-Allah Hashmienejad & Seyed Mohammad Hossein Hasheminejad (2017) Traffic accident severity prediction using a novel multi-objective genetic [2] Graphical Prediction of Road Accidents using Data Analysis Swetha P.C, Vaishalli A., Devika M., Sneha Balaji,Vol-4 Issue-2 2018 IJARIIE. [3] Camilo Gutierrez-Osorio*, Cesar Pedraza Departamento de Ingenierı ́a de Sistemas e Industrial, Universidad Nacional de Colombia, Bogota, Colombia,15 May 2020―Modern Data Sources And Techniques For Analysis And Forecast Of Road Accidents: A Review‖ Journal of traffic and transportation engineering, [4] QiangLu1 and Kyoung-Dae Kim2, 11 October 2017,―A Genetic Algorithm Approach For Expedited Crossing Of Emergency Vehicles In Connected And Autonomous Intersection Traffic‖ Journal of Advanced Transportation Volume 2017, Article ID 7318917, 14 pages. [5] Prediction for traffic accident severity: comparing the artificial neural network, genetic algorithm, combined genetic algorithm and pattern search methods, Mehmet Metin Kunt , Iman Aghayan & Nima Noii,21 March 2014, At: 06:19 Publisher: Taylor & Francis [6] Predicting Crash Injury Severity with Machine Learning Algorithm Synergized with Clustering Technique: A Promising Protocol Khaled Assi, Syed Masiur Rahman, Umer Mansoor and Nedal Ratrout, 27 June 2020 [7] A stable and optimized neural network model for crash injury severity prediction Qiang Zeng, Helai Huang, Received 29 May 2014 [8] A cluster analysis on road traffic accidents using genetic algorithms Sabariah Saharan and Roberto Baragona, 2017 [9] Application of Support Vector Machine for Crash Injury Severity Prediction: A Model Comparison Approach Iman Aghayan1, Mansour Hadji Hosseinlou2 , Mehmet Metin Kunt3,,Journal of Civil Engineering and Urbanism Volume 5, Issue 5: 193-199; September 25, 2015 [10] Efficient Analysis of Traffic Accident Using Mining Techniques S.Vigneswaran1 A. Arun Joseph2; E.Rajamanickam3,march 2014 [11] ANALYSIS OF ROADWAY FATAL ACCIDENTS USING ENSEMBLE-BASED [12] Prediction of Road Traffic using Naive Bayes Algorithm. Baby Anitha1,R. Aravinth2, S. Deepak3 ,RTICCT – 2019 [13] Evaluation of Accidental Death Records Using Hybrid Genetic Algorithm Nikhil Sharma, Ila Kaushik, Rajat Rathi, Santosh Kumar, International Conference on Innovative Computing and Communication (ICICC-2020) [14] A comparative study on machine learning based algorithms for prediction of motorcycle crash severity Lukuman WahabID, Haobin Jiang, April 4, 2019 [15] Overview of traffic incident duration analysis and prediction Ruimin Li, Francisco C. Pereira and Moshe E. Ben-Akiva, European Transport Research Review (2018) [16] Graphical Prediction of Road Accidents using Data Analysis Swetha P.C, Vaishalli A., Devika M., Sneha Balaji,Vol-4 Issue-2 2018 IJARIIE-ISSN(O)-2395-4396 [17] Prediction of Accident Severity Using Artificial Neural Network: A Comparison of Analytical Capabilities between Python and R Imran Chowdhury Dipto1, Md Ashiqur Rahman1, Tanzila Islam2, H M Mostafizur Rahman3,Journal of Data Analysis and Information Processing, 2020, August 7, 2020 [18] Data quality analysis of interregional travel demand: Extracting travel patterns using matrix decomposition☆ Canh Xuan Do, Makoto Tsukai, Akimasa Fujiwara a,2020 [19] Boosted Genetic Algorithm using Machine Learning for traffic control optimization Tuo Mao, Adriana-Simona Mihait Senior Member, IEEE, Fang Chen, and Hai L. Vu., March 2021 [20] Macro prediction model of road traffic accident based on neural network and genetic algorithm, QIN Liyan SHAG Chunfu, 2009 [21] Mortality-Risk Prediction Model from Road-Traffic Injury in Drunk Drivers: Machine Learning Approach, Wachiranun Sirikul, Nida Buawangpong, Ratana Sapbamrer 1 and Penprapa Siviroj, 8 October 2021,

Corresponding Author Ms. Nikita Ingawale*

PG Student, Department of Civil Engineering, TSSM‘S Padmabhooshan Vasantdada Patil Institute Technology, Pune