Skip to content

Commit

Permalink
fix typos, add citation
Browse files Browse the repository at this point in the history
  • Loading branch information
Noor Buchi committed May 15, 2022
1 parent 4fe6e96 commit 004ee0d
Show file tree
Hide file tree
Showing 6 changed files with 136 additions and 117 deletions.
12 changes: 6 additions & 6 deletions chapters/ch01_introduction.tex
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ \subsection{SBFL in Action}
from three integers. The program contains a bug on line 6 where the wrong maximum
value is detected. The figure also shows seven different test cases that send
various inputs to the function and check whether the actual output matches the
expected. The results of each test is found on the last row of the table.
expected. The results of each test are found on the last row of the table.
Additionally, the large dots under the Input Tests column illustrate the concept
of code coverage. For each line of code and test input, a dot in the cell means
that the line was executed when this input was passed. On the rightmost column
Expand Down Expand Up @@ -238,7 +238,7 @@ \section{Main Aims}
packages such as Pytest and Coverage.py can be used to collect test suite data
in order to calculate suspiciousness. AFLuent runs as a Pytest plugin and is
integrated with the command line interface of Pytest, this feature increases
it's accessibility by allowing developers to easily integrate it into their
it accessibility by allowing developers to easily integrate it into their
development environment.

Following the implementation of AFLuent, this research evaluates the
Expand Down Expand Up @@ -274,7 +274,7 @@ \section{Research Questions}
available literature on SBFL is analyzed and the most popular and cited formulas
are included in the implementation of AFLuent. Answering this question also
requires that each approach is evaluated through an experiment section.
Since this research question is includes two separate sections, it's further split into
Since this research question includes two separate sections, it's further split into
smaller sub-questions discussed below.

\begin{center}
Expand All @@ -301,7 +301,7 @@ \section{Research Questions}

To ensure correctness and effectiveness, the implemented formulas in AFLuent are
evaluated through experiments that measure their accuracy in sorting suspicious
statements and blocks. More specifically, the formulas will be assessed in in
statements and blocks. More specifically, the formulas will be assessed in
the context of Python projects that use the Pytest unit testing framework.
More details on this research question can be found in the evaluation section.

Expand All @@ -323,7 +323,7 @@ \section{Research Questions}
In addition to ensuring a smooth user experience while utilizing AFLuent
functionalities, setup process and usage of the tool AFLuent is simplified to
facilitate installation. Clear and descriptive documentation is also a crucial
step in making AFLuent accessible available for new users.
step in making AFLuent accessible and available for new users.

\section{Thesis Outline}
\label{sec:outline}
Expand All @@ -335,6 +335,6 @@ \section{Thesis Outline}
standards. The different tools used to build and test AFLuent are also
discussed in the methods sections. Following that, the evaluation section
describes the steps taken to evaluate AFLuent by testing the tool and
collecting data regarding it's output. The evaluation section also includes an
collecting data regarding its output. The evaluation section also includes an
analysis of the results of the evaluation and various plots and charts that
show the findings.
59 changes: 30 additions & 29 deletions chapters/ch02_relatedwork.tex
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ \chapter{Related Work}
to facilitate debugging and increase developer efficiency. Considering that
AFLuent relies on many concepts developed by this literature, this section will
explore and discuss how past work shapes AFLuent. Several sections are created
to for specific area of literature.
for specific areas of literature.

\section{Automated Fault Localization}
\label{sec:AFLlit}
Expand All @@ -29,9 +29,9 @@ \section{Automated Fault Localization}
majority of found papers are focused on Spectrum-Based Fault Localization (SBFL).
Overall this research provides a great
starting point to find and compare the different types and approaches of AFL.
Another benefit of this resources is that
Another benefit of these resources is that
Wong et al. \cite{wong2016survey} expands on the types of SBFL
and reviews key literature that contributes show the benefits and drawbacks of
and reviews key literature that contributes to show the benefits and drawbacks of
each approach.

Another insightful survey paper is by Idrees Sarhan et. al \cite{sarhan2022Challenges}
Expand All @@ -50,7 +50,7 @@ \subsubsection{Similarity Coefficient Based Technique}
One of the most relevant SBFL techniques described by Wong et al.
\cite{wong2016survey} is similarity coefficient based ones. Generally, these approaches
seek to quantify how close ``the execution pattern of a statement is to the
failure pattern of all test cases'', where the the closer they are the more
failure pattern of all test cases'', where the closer they are the more
likely that this statement to contain the error. In order to create a
measurement of closeness, several equations have been developed and evaluated by
past literature. Figure \ref{fig:sbfl_eq} shows some of the equations reviewed by
Expand Down Expand Up @@ -90,14 +90,14 @@ \subsubsection{Tarantula}
what causes the numerator to grow larger. This means that an increase in failed
tests that cover the element cause an increase in suspiciousness. Additionally,
a decrease in the number of failing tests that do not cover the element also
increase suspiciousness. Considering these two points, Tarantula gives a better
increases suspiciousness. Considering these two points, Tarantula gives a better
indicator of suspiciousness when there are fewer failures in tests covering
elements not under inspection. In addition to the logical analysis of the
equation previous works provide an empirical evaluation of Tarantula in
comparison to other formulas. Jones et al. \cite{Jones2005TarantulaEval}
compares the effectiveness and efficiency of Tarantula to techniques such as Set Union, Set
Intersection, and Nearest Neighbor. The results demonstrate that Tarantula
outperform the other Techniques where it provided a better guidance to the
outperformed the other Techniques where it provided better guidance to the
developer. Using Tarantula a developer would need to manually
inspect fewer elements of the program compared to when using other approaches.

Expand All @@ -109,7 +109,7 @@ \subsubsection{Tarantula}
exist, the first one based on the number of failed tests covering the element,
and then the suspiciousness scores. The empirical results in Debroy et al.
\cite{debroy2010grouping} show a statistically significant improvement provided
by this grouping technique where the developer need to review less elements and
by this grouping technique where the developer needs to review less elements and
more faults are accurately detected. While Debroy et al. only applied the
grouping technique to Tarantula and a neural network-based approach, it could be
extended to include other similarity coefficient based techniques.
Expand All @@ -124,7 +124,7 @@ \subsubsection{Ochiai}
\label{subsubsec:ochiai_lit}

Ochiai is another similarity coefficient formula for SBFL that uses code
coverage information and test output to produce as suspiciousness score.
coverage information and test output to produce a suspiciousness score.
Originally used in computing genetic similarity in molecular biology and
evaluated in Abreu et al. \cite{Abreu2006Ochiai}, the equation for this approach
is shown in fog.\ref{fig:ochiaiEquation}. Similar to Tarantula, the number of
Expand All @@ -133,12 +133,12 @@ \subsubsection{Ochiai}
of tests that cover the element, unlike Tarantula, however, it does not consider
successful tests that do not cover the element.
Papers such as \cite{Abreu2006Ochiai,ABREU20091780} also evaluate the
performance of Ochiai in comparison to other such as Tarantula, AMPLE, and
performance of Ochiai in comparison to others such as Tarantula, AMPLE, and
Jaccard. Another evaluation of Ochiai is done by Le et al. \cite{le2013theory}
where it was found to have a statistically significant improvement when compared
to Tarantula. The paper demonstrates that on average developers only need to
inspect 21.02\% of the source code before finding the fault.
AFLuent includes and implementation and evaluation of Ochiai to
AFLuent includes an implementation and evaluation of Ochiai to
validate that it performs as expected compared to the Tarantula technique.
Additionally, considering that Ochiai is considered a fairly accurate and
effective formula to detect faults, AFLuent takes advantage of the performance
Expand Down Expand Up @@ -170,7 +170,7 @@ \subsubsection{DStar}
information of a program to locate and rank faults. The equation for this
approach can be found in figure \ref{fig:dstarEquation}. Wong et al.
\cite{Wong2014DStar} introduce and extensively evaluate this approach in a 2014
paper that demonstrate it's effectiveness compared to other formulas. In the
paper that demonstrate its effectiveness compared to other formulas. In the
process of constructing D*, the paper lists the factors involved in determining
suspiciousness of an element. The principles are as follows:
\begin{enumerate}
Expand All @@ -185,34 +185,34 @@ \subsubsection{DStar}
\end{enumerate}

Considering that multiplying \(\textbf{N$_{CF}$}\) by a constant to increase its
weight will not affect the ranking of statements, he authors argue that
rasing \(\textbf{N$_{CF}$}\) to a value * greater than
weight will not affect the ranking of statements, the authors argue that
raising \(\textbf{N$_{CF}$}\) to a value * greater than
or equal to 1 would be more appropriate in increasing the weight of this
variable. The study continues by illustrating how increasing the value of *
produces more clear rankings that facilitate the debugging process by requiring
the developer to examine less elements in bot the best and worst case. However,
the authors also point out that this benefit of increasing teh value of * levels
the developer to examine less elements in both the best and worst case. However,
the authors also point out that this benefit of increasing the value of * levels
off at a certain point depending on the size of the program under analysis.
The paper concludes by reviewing performance results showing that D* is more
effective than the previously discussed formulas (Tarantula, Ochiai, and
Ochiai2). With that in mind, D* offers the latest and most effective formula to
calculate suspiciousness compared to all others included in this research.
AFLuent implements D* to validate this step up in effectiveness in the context of
Python projects and gives the user the ability to use t.
Python projects and gives the user the ability to use it.

\subsection{Combining Approaches}
\label{subsec:combining_approaches}

While AFLuent only relies SBFL approaches in its implementations, it's
While AFLuent only relies on SBFL approaches in its implementations, it's
useful to explore other methodologies that could assist in the debugging
process. This creates a guide for potential extention of AFLuent and
process. This creates a guide for potential extension of AFLuent and
provides a way to fill in the shortcomings of AFLuent. Xuan et al. explores the
possibility of combining several SBFL metrics of fault localization and
introducing a machine learning model to assist with the ranking
\cite{Xuan2014Combine}. While AFLuent does not support this approach, Xuan et
al. shows some promising results that could potentially uncover performance
improvements in fault localization. There are many tricky aspects of this
research, especially that it suggests to train a machine learning model to
research, especially that it suggests training a machine learning model to
assist with ranking. Depending on the data used to train the model, the results
could be very different. Overall, while AFLuent does not use machine learning,
this research provides a great idea for future work and improvements.
Expand All @@ -236,7 +236,8 @@ \subsection{Acknowledging Problems}
\label{subsec:acknowledging_problems}

With the multitude of approaches and formulas to use in SBFL, various criticisms
are brought up for each proposed research. In a survey study, Wong et al.
are brought up for each proposed research. Some research even suggests that SBFL
and AFL in general is not effective for all developers \cite{parnin}. In a survey study, Wong et al.
\cite{wong2016survey} identifies a series of issues and concerns surrounding
SBFL in general. The main one being the central problem of giving failed and
successful tests accurate weights in order to produce a meaningful
Expand All @@ -251,22 +252,22 @@ \subsection{Acknowledging Problems}
One of the brought up concerns of SBFL is the inclusion of passed program
spectra in calculating suspiciousness of an element. Xie et al.
\cite{xie2010isolating} argue that while a failed program test case does
indicate the presence of an error a passed program spectra/test data, ``is not
indicate the presence of an error in a passed program spectra/test data, ``is not
guaranteed to be absolutely free of any faulty statement''. With that in mind,
passed tests information alone do not give reliable results on an element
passed test information alone does not give reliable results on elements
suspiciousness. The proposed approach to mitigate this problem is to organize
program entities into two main groups, those who have been ``activated'' at
least once by a failed program spectra, and ``clean'' ones, which have not at
all. The research continues by experimenting with this approach and presenting
results that showed some signs of improvement on existing SBFL formulas.
Overall, this research provides a way to address inaccuracies with AFLuent and
assists in expending the project beyond simple calculations based on formulas.
assists in expanding the project beyond simple calculations based on formulas.

Another concern with the use of SBFL to debug programs is the possibility of
having equal suspiciousness scores assigned to multiple statements. These ties
hinder the debugging process and present the developer with a dilemma. Which
element should be inspected first? they're equally suspicious! This problem
becomes more significant when only one of the tied elements actually contain the
becomes more significant when only one of the tied elements actually contains the
fault. A study by Xu et al. \cite{xu2011ties} recognizes this problem and
expands on the different outcomes. In the best case, the developer picks the
statement containing the fault as their first choice and finds the error right
Expand Down Expand Up @@ -314,7 +315,7 @@ \section{Existing Tools}
program spectra and calculates suspiciousness scores using Tarantula, Ochiai,
and DStar approaches. Overall, CharmFL has many similarities with AFLuent, but
it's also less accessible considering that it's a PyCharm plugin which is not
used by every developer. Overall, the implementation of CharmFL provides and
used by every developer. Overall, the implementation of CharmFL provides an
inspiration for AFLuent and encourages improvements where CharmFL may fall
short.

Expand All @@ -335,22 +336,22 @@ \section{Usability and Accessibility}
and verbosity of output messages from the tool. Instead of simply displaying the
ranked scores of statements, it would be more user friendly to explain the
meaning of the output to guide the user into beginning the debugging process.
Kohn \cite{kohn2019error} explores the experience of beginner with Python errors
Kohn \cite{kohn2019error} explores the experience of beginners with Python errors
with different severity and various Python interpreter error output. The results
confirm that more clear error messages tend to have a higher percentage of
students finding and fixing the error. This connection between error output and
the ability for beginner developers to fix faults is very crucial in the case of
AFLuent. And while a user survey is out of scope of this research, it's Kohn
AFLuent. And while a user survey is out of scope of this research, Kohn
provides encouragement to account for the different use cases in AFLuent and
attempt to provide a clear output that describes the fault and guides the
attempts to provide a clear output that describes the fault and guides the
developer for the next step.

Another aspiration of AFLuent is to assist beginners in debugging their code in
ways that go beyond simply looking at the suspiciousness ranking of elements. By
identifying popular python errors in Python among beginners, cause of faults can
more quickly be pointed out after statement ranking has been produced. These
steps require additional analysis of the suspicious statements by analyzing
their syntax to identify potential cause. The goal of AFLuent would then become
their syntax to identify potential causes. The goal of AFLuent would then become
more than simply locating the fault, but also giving an educated guess regarding
the reason behind the error. Cosman et al. \cite{cosman2020pablo} create a tool
named PABLO that uses a trained classifier to identify common bugs and faults in
Expand Down
Loading

0 comments on commit 004ee0d

Please sign in to comment.