A study the web services using machine
learning for personalized QOS are recommended
Amita Boral1*, Dr. Kishan
Kumar2
1 Research Scholar, Shri Krishna University,
Chhatarpur, M.P.
ouriginal.sku@gmail.com
2 Professor, Shri Krishna
University, Chhatarpur, M.P.
Abstract-
This
experiment addresses the challenge of predicting Quality of Service (QoS)
values for web services using various linear regression methods, including
fitrtree (binary regression decision tree), fit ensemble with LSBoost and Bag,
and lasso regression. By leveraging past user experiences, these models aim to
recommend web services based on predicted QoS values with a focus on minimizing
prediction errors. Data for the study will be collected from real-world web
services across diverse locations. Four linear regression models will be
implemented, each evaluating prediction accuracy. Initial findings suggest that
fit ensemble with Bag and lasso regression perform better in predicting QoS
values with minimal accuracy deviations, making them more effective for
recommending web services. Experiments will be conducted using a QoS dataset
derived from millions of real-world web service interactions, involving web
services from 29 countries and users from 31 countries. Predictions will cover
all users to recommend 1,292 unique web services. Recommendations will be
tailored for continent-specific users, providing regionally optimized services.
The study includes an in-depth analysis of datasets, focusing on the
country-wise and continent-wise distribution of web services and users.
Keywords-
QoS, Machine
Learning, Linear Regression, Prediction Accuracy, Web Services.
INTRODUCTION
It is essential to place
a strong emphasis on QoS characteristics such as response time, reliability,
price, failure rate detection, high-performance computing,
availability of web services, and high throughput in order to strengthen the
efficiency of web services in environments that are completely heterogeneous or
rapidly changing. This is necessary in order to persuade end-users to choose
web services that are hosted in the cloud. A difficult research challenge is
the intelligent & cost-effective transformation of cloud services from
diverse sources into a popular service. In order to address the research
problem, this section explains an approach.
Web service QoS is extensively used to reveal
non-functional attributes. According to Zhang (2011), Zheng (2010), Albu
(2013), and Ding (2014), quality of service is characterized by a collection of
characteristics including availability, reputation, reaction time, throughput,
and so on. Some QoS factors, such as response speed and user pragmatic
accessibility, need user-side calculation of their values (Zheng, 2010).
Therefore, providers may not always be able to realistically acquire such QoS
data. In addition, the user's situation (such as the user's continent or
network condition) & unpredictable Internet environment can affect
these QoS values.
To address this issue, one can employ computational
prediction models like fitrtree, which is a linear regression method, or
fitensemble, which is an ensemble of learners for classification and
regression, along with parameters like LSBoost, Bag, or lasso. These models can
then use the available user data to forecast the QoS-value. Predicting the
tailored QoS aware online services was a clear way to recommend them based on
the testing findings.
Typically, non-functional aspects of online services
are described using QoS. Although QoS-values on the server side are better
indicators of server capacity, QoS-values on the client side are more accurate
measures of how users actually feel about the service's performance. Buyerya
(2010), Bakshi (2009), Mell (2011), Zhang (2011), and Zheng (2010) cited prior
research on web service QoS. The client-side web service QoS attributes that
are commonly utilized include:
•
Response-time:
Duration between service users' requests and responses.
•
Throughput:
The typical rate at which messages are delivered over a communication channel
is called throughput. Kilobits per second (kbps) is the usual notation for it.
•
Failure-probability:
the chance that an attempt to invoke a web service will be unsuccessful.
Quality
of service (QoS) data is essential for planning, performance evaluation, &
prediction, and it is obtained from a large
array of domains via real-world service invocations by users. Data has become
an invaluable tool for organizations in the age of the information economy.
There isn't a perfect setting for extracting insights from data models that use
complicated computational techniques. Therefore, a capacity environment that is
elastic and available on demand is necessary. Due to the utilization of a huge
QoS dataset, failure has become the standard in the current system. An
essential architectural objective of the suggested architecture is the
discovery of missing values & frequent predictions from web service
precedent custom data. QoS concerns like as availability, response time,
throughput, failure rate detection, high-performance computing, and
comprehensive diversity & rapid change must be addressed if users are to be
satisfied. Finding an efficient way to transform various cloud services into
intelligence and then distribute that intelligence to the appropriate resource
while keeping costs low is no easy feat. In order to conduct experiments, QoS
datasets that have some missing values are being utilized.
Fitrtree
Fitrtree
first calculates the number of branch nodes and then divides every node in
the current layer in order to accommodate MaxNumSplits (Breiman, 1984). Optimal
tree balance will be achieved. If there are more branch nodes than available
splits, fitrtree:
·
Determines which branch
nodes must be divided in order to achieve MaxNumSplits branch nodes.
·
Sort the nodes of the
branches in descending order of impurity gains.
·
Remove the fraction of
the total number of branches that were successful.
·
The decision tree that
has been generated up to this point should be returned.
According
to Breiman (1996), it combines the data used to train weak learner models with
the actual data. With the help of the cumulative predictions of its week
learners, an ensemble may anticipate its response to fresh data. Finding the
sweet spot between speed & accuracy is what it takes to choose an
ensemble's dimension. Predictions generated by robust ensembles may take more
time, and there is a chance that these algorithms will be incorrect.
Consider beginning with a small ensemble size (dozens
to hundreds of members), training it, and then testing its quality (test
ensemble quality) to determine an optimal size. You can use techniques like
classification & regression to increase the size of your group if it seems
like you need it. Adding additional members to the ensemble does not enhance
its quality.
Steps of the ensemble-based algorithm:
Data is represented by the matrix x. One observation
is contained in each row, and one predictor variable is contained in each
column.
·
The number of
observations in t, the reply vector, is equal to the number of rows in x.
·
A string indicating the
type of ensemble is the model.
·
Numberens is the sum of
all the elements of learners who are deficient in ens.
·
Therefore, ens is equal
to numberens multiplied by the numerical value of learners for the number of
elements.
·
Learners might represent
a string describing a poor student's performance in class, or it could be a
template or collection of templates for that student.

Figure 1: Steps to Create an Ensemble
learner
LSBoost,
which stands for "Least Squares Boosting," adjusts regression
ensembles—as proposed by Breiman (1984),
Hastie (2008), or Polikar (2006)—based on the difference between the observed
response & total of all learners' predictions. To reduce the MSE, the
ensemble fits here.
Regression
Bagged ensemble method uses a combination of predictions from its week
learners and trained weak learner models to
anticipate the ensemble's response (Breiman, 1996). Estimating the
generalization error does not necessitate further cross-validation.
One
way to do linear regression is using the Lasso, which stands for Least Absolute
Shrinkage and Selection Operator (Tibshirani,
1996).
The lasso algorithm fits a linear model given a
collection of input dimensions 𝑌1,
𝑋2
… 𝑋𝑛
and a result dimension 𝑌
as:
(1)
The decisive factor, it uses is:
(2)
The parameter s serves as a tuning factor and is a
boundary condition. If s is greater than zero, the output will be a modified
version of the least square estimates. If s is less than zero, the output will
be the standard multiple linear least squares regression of 𝑌 𝑜𝑛 𝑋1,
𝑋2
… 𝑋𝑛.
Stepwise Forward Regression Algorithm:
•
Start with coefficients 𝑏𝑗
= 0 .
•
Estimate the predictor 𝑋𝑗
mainly correlated with 𝑌,
and append it into the model.
•
Obtain residuals 𝑟
= 𝑌
− 𝑌̂
.
•
Keep adding the predictor
most strongly connected with 𝑓
to the model at each step until all predictors have been estimated.
For
real-world web service requests to the cloud, missing value prediction is a
must-have duty. The prediction accuracy
is significantly affected by the sparseness of the user-web services matrix,
which often pertains to the reaction time & throughput of QoS attributes.
Active user prediction accuracy can be enhanced by forecasting missing values
for the QoS matrix. For this reason, we employed MATLAB's regression-based
missing value prediction models to increase the density of the matrix. To
forecast the values, eight different kinds of QoS matrices were used, with
three user groups being used: all, normalized, & continent-wise. For the
purpose of missing value prediction, all user types will make use of these
excessive preprocessing QoS datasets.
The
problem of quality-of-service (QoS) aware online service recommendation has
grown in importance due to the meteoric rise of
cloud-based web services over the past decade. Due to the fact that gathering
QoS-information about web services is a laborious and, at times, impracticable
process, the values of certain services' QoS-value are often absent. Traditional
methods relying on the user-item matrix static model to forecast the missing
QoS-value are inefficient and prone to errors. Then, employing new models whose
performance is significantly better than some classic forecasting models
allowed for the proper and efficient suggestion of web services for users. The
experimental comparison section details these newly used models.
CONTINENTS DEFINITIONS
AND FEATURES
A
u*w user-item matrix can be used to represent the relationship between online
services & users in a recommender system
where there are w web services and u users. Each row in this matrix reflects a
quality of service (QoS) metric that user i has measured for web service j; for
example, ruiwj stands for response time and tuiwj for throughput. A group of
users from the same continent—North America, South America, Europe, or Asia—is
called a user continent. The term "web service continent" refers to a
collection of interconnected online services that originate in one of the five
major North American, South American, European, or Australian continents. Every
single user and every single web service is located on one specific continent.

Figure 2: User-Continent Interaction with
Web Services
The
purpose of the Experiment was to evaluate several state-of-the-art
methodologies and important regression approaches for prediction accuracy.
The following questions were meant to be addressed by the Experiments: 1) Is it
feasible to utilize it to propose web services to consumers on the same
continent? 2) Effects of dataset density.
Data
collection pertaining to commercial Web services from all corners of the globe is
a challenging task in the actual world. Consequently, the dataset that was
shaped by (Zhang, 2011; Dinget, 2014; Zheng, 2013) was utilized, which is
related to web services. In order to gather these online invocations of Web
services by active users in a cloud environment, they established a lab named
PlanetLab (Chun, 2015). Table-4.1 provides information regarding the dataset.
They developed two matrices, rtmatrix & tpmatrix, that provide throughput
and reaction time values for various users across various web services. Where
RTi,j in the rtmatrix represents the user's reaction time for Web service j and
TPi,j in the tpmatrix represents the user's throughput for Web service j,
respectively.
The
QoS attributes were preprocessed and eight new matrices were generated for each,
labelled dataset1 through dataset8, utilizing a MATLAB tool that removes
columns with missing values. In mathematics, matrices can be of varying sizes.
First, the original dataset was used to extract dataset1, and then datasets 2–8
were also extracted from dataset 1. In both datasets, there were 43,798,8
QoS-value invocations, and the corresponding complete QoS dataset matrices were
339 by 1,292. These matrices are perfect for training MATLAB learning models to
predict the target class for each column because they don't have any missing
values. Table 1 provides in-depth statistical characterization of the
preprocessed dataset 1.
Table 1: WS QoS-Dataset Statistics
|
Statistics |
Response Time |
Throughput |
|
Scale or Range of values |
0 - 20 seconds |
0.06 - 1000 kbps |
|
Mean value |
0.6525 |
27.3271 |
|
Standard Deviation |
0.7061 |
30.6155 |
|
Maximum value |
19.9520 |
1000 |
|
Minimum value |
0.0020 |
0.0600 |
|
Number of user’s countries |
31 |
31 |
|
Number of web services countries |
29 |
29 |
|
Number of Users |
339 |
339 |
|
Number of Web Services |
1292 |
1292 |
|
Number of Invocations |
437988 |
437988 |
•
Dataset1: This
information was extracted straight from PlanetLab's initial dataset. It has
been preprocessed from the original data by decreasing or deleting the columns
with missing values; its size is 339*1292. Among the most extensive databases,
it is quite vast.
•
Dataset2: The
dataset has been normalized and has dimensions of 105*320. Each of the five
continents was represented by a minimum of 64 online services,
& minimum of 35 users were selected from each of the three main
continents (35*3=105).
•
Dataset3: All
web services were recommended to only Asian customers. It measures 35 by 1292.
•
Dataset4: For
the benefit of Asian users, only Asian web services were utilized. The
dimensions are 35 by 245.
•
Dataset5: The
survey exclusively polled European users & only recommended web services to
them. It measures 123 by 1292.
•
Dataset6: We
exclusively used European web services for our European users. It measures 123
by 472 inches.
•
Dataset7: All
web services were suggested to users in North America. The measurements are 175
by 1292.
•
Dataset8: We
exclusively gathered web services for North American users. The dimensions are
175 by 419.
Table 2: Dataset that has been
preprocessed
|
Attributes |
Values |
|
Number
of users |
339 |
|
Number
of web services |
1292 |
|
Number
of user’s countries |
31 |
|
Number
of web services countries |
29 |
|
Matrix
Size of datasets |
339*1292 |
Table 3: Distribution of users by
continent
|
Continent’s name |
# of users |
|
Asia (AS) |
35 |
|
Europe (EU) |
123 |
|
North America (NA) |
175 |
|
South America (SA) |
6 |
Table 4: Distribution of Web Services by
Continent
|
Continent’s name |
# of web services |
|
Asia (AS) |
245 |
|
Europe (EU) |
472 |
|
North America (NA) |
419 |
|
South America (SA) |
92 |
|
Australia (AU) |
64 |
Metric for Evaluating
Performance
To
compare the quality of the predictions made by different regression algorithms,
the MSE metric was utilized. Better performance is indicated by a smaller
MSE value.
MSE is characterized as
![]()
This is where 𝑈𝑖𝑊𝑎
stands for the QoS value that user i has seen for web service j, 𝑈̂𝑖𝑊𝑎
for the expected QoS value, and N for the total number of forecasted
values. MSE is a statistical method that involves squaring and
averaging the discrepancy between predicted or actual values.
Performance
Metric for Predictions on the Dataset using 10-fold cross-validation
•
The lasso approach
consistently produces reduced MSE values under all default settings, suggesting
improved prediction accuracy.
•
As the number of online
services increases, the MSEs of the lasso, fitensemble (Bag), fitrtree,
& fitensemble (LSBoost) techniques decrease, suggesting that adding
additional QoS metrics improves prediction accuracy.
•
The prediction model
performs best when users are grouped by continent. Preferring local services
improves the accuracy of predictions.
•
Compared to the
prediction approaches recommended by (Chen, 2013), the lasso & fitensemble
(Bag) models outperformed IPCC, UPCC, WSRec, CBRec, RegionKNN, and LoRec.
Table 5: MSE values for regularized
dataset (105*320) and entire dataset (339*1292)
|
Mean (calculating MSE values for all columns) |
|||
|
QoS properties |
Regression methods |
Full Preprocessed Dataset1 (339*1292) |
Normalized Dataset2 (105*320) |
|
Response time |
Fitrtree |
0.8755 |
0.8681 |
|
Fitensemble (LSBoost) |
0.8702 |
0.9162 |
|
|
Fitensemble (Bag) |
0.5444 |
0.5820 |
|
|
Lasso |
0.6097 |
0.6644 |
|
|
Throughput |
Fitrtree |
508.0189 |
383.97 |
|
Fitensemble (LSBoost) |
643.5024 |
482.2403 |
|
|
Fitensemble (Bag) |
365.0933 |
311.4466 |
|
|
Lasso |
304.9751 |
234.8782 |
|
Table 6: Based on the user's continent,
recommend web services.
|
QoS properties |
Regression methods |
|
Asia |
|
Europe |
North America |
|
|
Dataset3 (35*1292) |
Dataset4 (35*245) |
Dataset5 (123*1292) |
Dataset6 (123*472) |
Dataset7 (175*1292) |
Dataset8 (175*419) |
||
|
Response time |
Fitrtree |
1.4462 |
0.9167 |
0.9011 |
1.2011 |
0.8147 |
0.7233 |
|
Fitensemble (LSBoost) |
1.6343 |
1.0689 |
0.9309 |
1.2736 |
0.8499 |
0.8102 |
|
|
Fitensemble (Bag) |
1.2545 |
0.8917 |
0.5708 |
0.8266 |
0.5267 |
0.4995 |
|
|
Lasso |
0.9971 |
0.7267 |
0.6885 |
1.0389 |
0.5674 |
0.5358 |
|
|
Throughput |
Fitrtree |
165.8760 |
16.6473 |
339.2962 |
133.4694 |
726.7981 |
292.6088 |
|
Fitensemble (LSBoost) |
149.2656 |
29.2060 |
383.4500 |
137.0153 |
937.6143 |
445.2717 |
|
|
Fitensemble (Bag) |
130.7990 |
12.8699 |
248.3225 |
99.0320 |
534.0454 |
240.0586 |
|
|
Lasso |
102.9153 |
12.2483 |
197.9621 |
93.8626 |
426.7096 |
168.1700 |
|
Effects of distribution
by continent
The
values of certain QoS attributes, such as web service throughput and
response time, differ greatly depending on user location & internet
connection. As a result, users are categorized
according to continents and web services. Using smaller datasets with improved
precision will be done if web services are available at the local data center.
It is preferable to suggest online services with local data centers to
consumers in a certain location; for instance, Google offers data centers in
Asia, Europe, the Americas, etc. Giving Asian users recommendations for web
services is one example.
In proposing a service from the web services of data
centers across the continent, the current work's output would significantly
improve the prediction accuracy for customers distributed across the entire
continent. When compared to response times predicted by the best algorithm
fitensemble (Bag), recommendations made utilizing all online services (large
datasets) for Asian customers had the lowest accuracy. Throughput
QoS-properties also had the lowest accuracy predictions.

Figure
3: Users' Distribution by Continent

Figure 4: Continental Web Service
Distribution
Figure 5: MSE Distribution for
Response Time across all datasets using Fitensemble (Bag)

Figure 6: Lasso distribution of
MSE for throughput across all datasets

Figure 7: The fitensemble (Bag)
distribution of MSE for response times across all datasets

Figure 8: The Lasso method for
distributing MSE for throughput among all conceivable datasets
Effects of predicting a
missing value
To
anticipate the missing values in the training matrix and make it denser, the missing
value prediction uses similar web services & users.
A training matrix and a target value vector were utilized in the suggested
prediction model. The size of the training set matrix is (u*w-1)
& size of the target set matrix is u*1. Full web services and each
user's web services column were both included in the projection. We used a
linear regression model to forecast web service QoS values, assuming that
customers do not have access to these metrics.
In order to determine which linear regression method
provides the most accurate recommendation of web services, four different kinds
of algorithms were used in a comparison fashion to examine the effects of
missing value prediction. As part of the trials, the most active users from the
real dataset were selected, and then, from both the QoS characteristics
datasets, the denser and more prevalent web services were chosen. Each of the
two QoS datasets—response time & throughput—had 1445 and 1347 full web
services, respectively. However, for the sake of this study, we will only be
using 1292 common web services from both datasets, as we have previously
eliminated those with missing values. Consequently, for the two QoS datasets, a
full denser matrix measuring 339*1292 was employed. In the first step, the prediction
model was applied to all eight types of datasets using the QoS parameters,
which are throughput and reaction time. Utilizing missing value prediction on
four logically distributed datasets, we were able to forecast user values based
on continent and entire web services. The following datasets were used:
•
There are three types of
missing value predictions: (1) for full datasets, (2) for normalized datasets,
where each continent has an equal number of users or web services involved or
employed, and (3) for users on a single continent who access distributed web
services globally.
•
The inability to forecast
values for customers on the same continent who utilize the similar web
services.
CONCLUSION
Regarding the Suggestion for Customized Online
Services, It was first discovered that, when it came to offering throughput,
Lasso regression algorithms performed the best across all datasets. For all
dataset versions except dataset3 & dataset4, which yielded the best results
when employing the lasso approach, Fitensemble (with Bag ensemble method)
outperformed other algorithms in terms of reaction time. Secondly, it was
discovered that ASIAN web service recommendations for ASIAN users are more
accurate & appropriate. It was determined that users on the same continent
would benefit most from continent-wise service recommendations in the event
that services were available at the local data center of that continent. This
is because customers on the same continent would have to travel less distance
and spend less money to access the services. It was suggested that in the event
that services were not available at the local data center on the continent,
services from the main data center or data centers on other continents may be
used instead.
References
1.
Ardagna D. Casale G.
Ciavotta M. Pérez J.F. and Wang W. (2014) ‘Quality-ofservice in cloud
computing: modeling techniques and their applications,’ Journal of Internet
Services and Applications, Vol.5, No.1, pp. 11.
3.
Ergu, D., Kou, G., Peng,
Y., Shi, Y., & Shi, Y. (2013). The analytic hierarchy process: task
scheduling and resource allocation in cloud computing environment. The
Journal of Supercomputing, 64, 835-848.
4.
Kumar N. and Saxena S.
(2015) ‘A preference-based resource allocation in cloud computing systems’,
Procedia Computer Science, Vol. 57, pp. 104-111.
5.
Moura J. and Hutchison D.
(2016) ‘Review and analysis of networking challenges in cloud computing’,
Journal of Network and Computer Applications, Vol. 60, pp. 113-129
6.
Nema, P., Choudhary, S.,
& Nema, T. (2015). Vm consolidation technique for green cloud
computing. Int J Comput Sci Inf Technol, 6, 4620-4624.
7.
Praveen S.P. Rao K.T. and
Janakiramaiah B. (2017) ‘Effective Allocation of Resources and Task Scheduling
in Cloud Environment using Social Group Optimization’, Arabian Journal for
Science and Engineering, pp. 1-8
8.
R. N. Calheiros, R.
Buyya, C.A.F.D. Rose, A heuristic for mapping virtual machines and links in
emulation testbeds, in Proceedings of the 38th International Conference on
Parallel Processing, Vienna, Austria, 2009.
9.
Sharkh M.A. Ouda A. and
Shami A. (2013) ‘A resource scheduling model for cloud computing data centers’,
In Wireless Communications and Mobile Computing Conference (IWCMC), 2013 9th
International, pp. 213-218, IEEE, 2013.
10.
Xiong A. and Xu C (2014)
‘Energy efficient multiresource allocation of virtual machine based on PSO in
cloud data center’, Mathematical Problems in Engineering, 2014
11.
Ye, K., Jiang, X., Huang,
D., Chen, J., & Wang, B. (2011, July). Live migration of multiple virtual
machines with resource reservation in cloud computing environments. In 2011
IEEE 4th International Conference on Cloud Computing (pp. 267-274).
IEEE.
12.
Zibin Zheng, Hao Ma,
Michael R. Lyu, and Irwin King, Collaborative Web Service QoS Prediction via
Neighborhood Integrated Matrix Factorization, IEEE Transactions on Services
Computing, Vol. 6, No. 3, 2013, 289-299.