diff --git a/chapters/ch01_introduction.tex b/chapters/ch01_introduction.tex
index 14805aa..9d0d6b1 100755
--- a/chapters/ch01_introduction.tex
+++ b/chapters/ch01_introduction.tex
@@ -189,7 +189,7 @@ \subsection{SBFL in Action}
 from three integers. The program contains a bug on line 6 where the wrong maximum
 value is detected. The figure also shows seven different test cases that send
 various inputs to the function and check whether the actual output matches the
-expected. The results of each test is found on the last row of the table.
+expected. The results of each test are found on the last row of the table.
 Additionally, the large dots under the Input Tests column illustrate the concept
 of code coverage. For each line of code and test input, a dot in the cell means
 that the line was executed when this input was passed. On the rightmost column
@@ -238,7 +238,7 @@ \section{Main Aims}
 packages such as Pytest and Coverage.py can be used to collect test suite data
 in order to calculate suspiciousness. AFLuent runs as a Pytest plugin and is
 integrated with the command line interface of Pytest, this feature increases
-it's accessibility by allowing developers to easily integrate it into their
+it accessibility by allowing developers to easily integrate it into their
 development environment.
 
 Following the implementation of AFLuent, this research evaluates the
@@ -274,7 +274,7 @@ \section{Research Questions}
 available literature on SBFL is analyzed and the most popular and cited formulas
 are included in the implementation of AFLuent. Answering this question also
 requires that each approach is evaluated through an experiment section.
-Since this research question is includes two separate sections, it's further split into
+Since this research question includes two separate sections, it's further split into
 smaller sub-questions discussed below.
 
 \begin{center}
@@ -301,7 +301,7 @@ \section{Research Questions}
 
 To ensure correctness and effectiveness, the implemented formulas in AFLuent are
 evaluated through experiments that measure their accuracy in sorting suspicious
-statements and blocks. More specifically, the formulas will be assessed in in
+statements and blocks. More specifically, the formulas will be assessed in
 the context of Python projects that use the Pytest unit testing framework.
 More details on this research question can be found in the evaluation section.
 
@@ -323,7 +323,7 @@ \section{Research Questions}
 In addition to ensuring a smooth user experience while utilizing AFLuent
 functionalities, setup process and usage of the tool AFLuent is simplified to
 facilitate installation. Clear and descriptive documentation is also a crucial
-step in making AFLuent accessible available for new users.
+step in making AFLuent accessible and available for new users.
 
 \section{Thesis Outline}
 \label{sec:outline}
@@ -335,6 +335,6 @@ \section{Thesis Outline}
 standards. The different tools used to build and test AFLuent are also
 discussed in the methods sections. Following that, the evaluation section
 describes the steps taken to evaluate AFLuent by testing the tool and
-collecting data regarding it's output. The evaluation section also includes an
+collecting data regarding its output. The evaluation section also includes an
 analysis of the results of the evaluation and various plots and charts that
 show the findings.
diff --git a/chapters/ch02_relatedwork.tex b/chapters/ch02_relatedwork.tex
index cd97adf..b14aa1c 100755
--- a/chapters/ch02_relatedwork.tex
+++ b/chapters/ch02_relatedwork.tex
@@ -6,7 +6,7 @@ \chapter{Related Work}
 to facilitate debugging and increase developer efficiency. Considering that
 AFLuent relies on many concepts developed by this literature, this section will
 explore and discuss how past work shapes AFLuent. Several sections are created
-to for specific area of literature.
+for specific areas of literature.
 
 \section{Automated Fault Localization}
 \label{sec:AFLlit}
@@ -29,9 +29,9 @@ \section{Automated Fault Localization}
 majority of found papers are focused on Spectrum-Based Fault Localization (SBFL).
 Overall this research provides a great
 starting point to find and compare the different types and approaches of AFL.
-Another benefit of this resources is that
+Another benefit of these resources is that
 Wong et al. \cite{wong2016survey} expands on the types of SBFL
-and reviews key literature that contributes show the benefits and drawbacks of
+and reviews key literature that contributes to show the benefits and drawbacks of
 each approach.
 
 Another insightful survey paper is by Idrees Sarhan et. al \cite{sarhan2022Challenges}
@@ -50,7 +50,7 @@ \subsubsection{Similarity Coefficient Based Technique}
 One of the most relevant SBFL techniques described by Wong et al.
 \cite{wong2016survey} is similarity coefficient based ones. Generally, these approaches
 seek to quantify how close ``the execution pattern of a statement is to the
-failure pattern of all test cases'', where the the closer they are the more
+failure pattern of all test cases'', where the closer they are the more
 likely that this statement to contain the error. In order to create a
 measurement of closeness, several equations have been developed and evaluated by
 past literature. Figure \ref{fig:sbfl_eq} shows some of the equations reviewed by
@@ -90,14 +90,14 @@ \subsubsection{Tarantula}
 what causes the numerator to grow larger. This means that an increase in failed
 tests that cover the element cause an increase in suspiciousness. Additionally,
 a decrease in the number of failing tests that do not cover the element also
-increase suspiciousness. Considering these two points, Tarantula gives a better
+increases suspiciousness. Considering these two points, Tarantula gives a better
 indicator of suspiciousness when there are fewer failures in tests covering
 elements not under inspection. In addition to the logical analysis of the
 equation previous works provide an empirical evaluation of Tarantula in
 comparison to other formulas. Jones et al. \cite{Jones2005TarantulaEval}
 compares the effectiveness and efficiency of Tarantula to techniques such as Set Union, Set
 Intersection, and Nearest Neighbor. The results demonstrate that Tarantula
-outperform the other Techniques where it provided a better guidance to the
+outperformed the other Techniques where it provided better guidance to the
 developer. Using Tarantula a developer would need to manually
 inspect fewer elements of the program compared to when using other approaches.
 
@@ -109,7 +109,7 @@ \subsubsection{Tarantula}
 exist, the first one based on the number of failed tests covering the element,
 and then the suspiciousness scores. The empirical results in Debroy et al.
 \cite{debroy2010grouping} show a statistically significant improvement provided
-by this grouping technique where the developer need to review less elements and
+by this grouping technique where the developer needs to review less elements and
 more faults are accurately detected. While Debroy et al. only applied the
 grouping technique to Tarantula and a neural network-based approach, it could be
 extended to include other similarity coefficient based techniques.
@@ -124,7 +124,7 @@ \subsubsection{Ochiai}
 \label{subsubsec:ochiai_lit}
 
 Ochiai is another similarity coefficient formula for SBFL that uses code
-coverage information and test output to produce as suspiciousness score.
+coverage information and test output to produce a suspiciousness score.
 Originally used in computing genetic similarity in molecular biology and
 evaluated in Abreu et al. \cite{Abreu2006Ochiai}, the equation for this approach
 is shown in fog.\ref{fig:ochiaiEquation}. Similar to Tarantula, the number of
@@ -133,12 +133,12 @@ \subsubsection{Ochiai}
 of tests that cover the element, unlike Tarantula, however, it does not consider
 successful tests that do not cover the element.
 Papers such as \cite{Abreu2006Ochiai,ABREU20091780} also evaluate the
-performance of Ochiai in comparison to other such as Tarantula, AMPLE, and
+performance of Ochiai in comparison to others such as Tarantula, AMPLE, and
 Jaccard. Another evaluation of Ochiai is done by Le et al. \cite{le2013theory}
 where it was found to have a statistically significant improvement when compared
 to Tarantula. The paper demonstrates that on average developers only need to
 inspect 21.02\% of the source code before finding the fault.
-AFLuent includes and implementation and evaluation of Ochiai to
+AFLuent includes an implementation and evaluation of Ochiai to
 validate that it performs as expected compared to the Tarantula technique.
 Additionally, considering that Ochiai is considered a fairly accurate and
 effective formula to detect faults, AFLuent takes advantage of the performance
@@ -170,7 +170,7 @@ \subsubsection{DStar}
 information of a program to locate and rank faults. The equation for this
 approach can be found in figure \ref{fig:dstarEquation}. Wong et al.
 \cite{Wong2014DStar} introduce and extensively evaluate this approach in a 2014
-paper that demonstrate it's effectiveness compared to other formulas. In the
+paper that demonstrate its effectiveness compared to other formulas. In the
 process of constructing D*, the paper lists the factors involved in determining
 suspiciousness of an element. The principles are as follows:
 \begin{enumerate}
@@ -185,34 +185,34 @@ \subsubsection{DStar}
 \end{enumerate}
 
 Considering that multiplying \(\textbf{N$_{CF}$}\) by a constant to increase its
-weight will not affect the ranking of statements, he authors argue that
-rasing \(\textbf{N$_{CF}$}\) to a value * greater than
+weight will not affect the ranking of statements, the authors argue that
+raising \(\textbf{N$_{CF}$}\) to a value * greater than
 or equal to 1 would be more appropriate in increasing the weight of this
 variable. The study continues by illustrating how increasing the value of *
 produces more clear rankings that facilitate the debugging process by requiring
-the developer to examine less elements in bot the best and worst case. However,
-the authors also point out that this benefit of increasing teh value of * levels
+the developer to examine less elements in both the best and worst case. However,
+the authors also point out that this benefit of increasing the value of * levels
 off at a certain point depending on the size of the program under analysis.
 The paper concludes by reviewing performance results showing that D* is more
 effective than the previously discussed formulas (Tarantula, Ochiai, and
 Ochiai2). With that in mind, D* offers the latest and most effective formula to
 calculate suspiciousness compared to all others included in this research.
 AFLuent implements D* to validate this step up in effectiveness in the context of
-Python projects and gives the user the ability to use t.
+Python projects and gives the user the ability to use it.
 
 \subsection{Combining Approaches}
 \label{subsec:combining_approaches}
 
-While AFLuent only relies SBFL approaches in its implementations, it's
+While AFLuent only relies on SBFL approaches in its implementations, it's
 useful to explore other methodologies that could assist in the debugging
-process. This creates a guide for potential extention of AFLuent and
+process. This creates a guide for potential extension of AFLuent and
 provides a way to fill in the shortcomings of AFLuent. Xuan et al. explores the
 possibility of combining several SBFL metrics of fault localization and
 introducing a machine learning model to assist with the ranking
 \cite{Xuan2014Combine}. While AFLuent does not support this approach, Xuan et
 al. shows some promising results that could potentially uncover performance
 improvements in fault localization. There are many tricky aspects of this
-research, especially that it suggests to train a machine learning model to
+research, especially that it suggests training a machine learning model to
 assist with ranking. Depending on the data used to train the model, the results
 could be very different. Overall, while AFLuent does not use machine learning,
 this research provides a great idea for future work and improvements.
@@ -236,7 +236,8 @@ \subsection{Acknowledging Problems}
 \label{subsec:acknowledging_problems}
 
 With the multitude of approaches and formulas to use in SBFL, various criticisms
-are brought up for each proposed research. In a survey study, Wong et al.
+are brought up for each proposed research. Some research even suggests that SBFL
+and AFL in general is not effective for all developers \cite{parnin}. In a survey study, Wong et al.
 \cite{wong2016survey} identifies a series of issues and concerns surrounding
 SBFL in general. The main one being the central problem of giving failed and
 successful tests accurate weights in order to produce a meaningful
@@ -251,22 +252,22 @@ \subsection{Acknowledging Problems}
 One of the brought up concerns of SBFL is the inclusion of passed program
 spectra in calculating suspiciousness of an element. Xie et al.
 \cite{xie2010isolating} argue that while a failed program test case does
-indicate the presence of an error a passed program spectra/test data, ``is not
+indicate the presence of an error in a passed program spectra/test data, ``is not
 guaranteed to be absolutely free of any faulty statement''. With that in mind,
-passed tests information alone do not give reliable results on an element
+passed test information alone does not give reliable results on elements
 suspiciousness. The proposed approach to mitigate this problem is to organize
 program entities into two main groups, those who have been ``activated'' at
 least once by a failed program spectra, and ``clean'' ones, which have not at
 all. The research continues by experimenting with this approach and presenting
 results that showed some signs of improvement on existing SBFL formulas.
 Overall, this research provides a way to address inaccuracies with AFLuent and
-assists in expending the project beyond simple calculations based on formulas.
+assists in expanding the project beyond simple calculations based on formulas.
 
 Another concern with the use of SBFL to debug programs is the possibility of
 having equal suspiciousness scores assigned to multiple statements. These ties
 hinder the debugging process and present the developer with a dilemma. Which
 element should be inspected first? they're equally suspicious! This problem
-becomes more significant when only one of the tied elements actually contain the
+becomes more significant when only one of the tied elements actually contains the
 fault. A study by Xu et al. \cite{xu2011ties} recognizes this problem and
 expands on the different outcomes. In the best case, the developer picks the
 statement containing the fault as their first choice and finds the error right
@@ -314,7 +315,7 @@ \section{Existing Tools}
 program spectra and calculates suspiciousness scores using Tarantula, Ochiai,
 and DStar approaches. Overall, CharmFL has many similarities with AFLuent, but
 it's also less accessible considering that it's a PyCharm plugin which is not
-used by every developer. Overall, the implementation of CharmFL provides and
+used by every developer. Overall, the implementation of CharmFL provides an
 inspiration for AFLuent and encourages improvements where CharmFL may fall
 short.
 
@@ -335,14 +336,14 @@ \section{Usability and Accessibility}
 and verbosity of output messages from the tool. Instead of simply displaying the
 ranked scores of statements, it would be more user friendly to explain the
 meaning of the output to guide the user into beginning the debugging process.
-Kohn \cite{kohn2019error} explores the experience of beginner with Python errors
+Kohn \cite{kohn2019error} explores the experience of beginners with Python errors
 with different severity and various Python interpreter error output. The results
 confirm that more clear error messages tend to have a higher percentage of
 students finding and fixing the error. This connection between error output and
 the ability for beginner developers to fix faults is very crucial in the case of
-AFLuent. And while a user survey is out of scope of this research, it's Kohn
+AFLuent. And while a user survey is out of scope of this research, Kohn
 provides encouragement to account for the different use cases in AFLuent and
-attempt to provide a clear output that describes the fault and guides the
+attempts to provide a clear output that describes the fault and guides the
 developer for the next step.
 
 Another aspiration of AFLuent is to assist beginners in debugging their code in
@@ -350,7 +351,7 @@ \section{Usability and Accessibility}
 identifying popular python errors in Python among beginners, cause of faults can
 more quickly be pointed out after statement ranking has been produced. These
 steps require additional analysis of the suspicious statements by analyzing
-their syntax to identify potential cause. The goal of AFLuent would then become
+their syntax to identify potential causes. The goal of AFLuent would then become
 more than simply locating the fault, but also giving an educated guess regarding
 the reason behind the error. Cosman et al. \cite{cosman2020pablo} create a tool
 named PABLO that uses a trained classifier to identify common bugs and faults in
diff --git a/chapters/ch03_method.tex b/chapters/ch03_method.tex
index 0eaa074..b99b6ef 100755
--- a/chapters/ch03_method.tex
+++ b/chapters/ch03_method.tex
@@ -4,14 +4,14 @@ \chapter{Method of Approach}
 This chapter describes the implementation of AFLuent and the experiment setup and
 execution process. More specifically, the reasoning behind design decisions and
 the result are the main focus. Additionally, charts and diagrams  are used to
-demonstrate the the algorithms, structure, and flow of execution.
+demonstrate the algorithms, structure, and flow of execution.
 
 \section{Development Environment and Toolset}
 \label{sec:DevEnviron}
 
 In order to begin discussing how AFLuent is implemented, a ground-up overview of the
 tools used and their roles is necessary to establish definitions and facilitate
-the understanding of how dependencies they are connected. By being a Python
+the understanding of how dependencies are connected. By being a Python
 package AFLuent can rely on a wide variety of helpful and popular
 tools. Some of the most important tools and dependencies are discussed below.
 
@@ -19,7 +19,7 @@ \subsection{Poetry}
 \label{subsec:poetry}
 
 Poetry is a Python virtual environment management tool that allows developers to
-set up an isolated environment for their projects. Furthermore, it manges the
+set up an isolated environment for their projects. Furthermore, it manages the
 installation of Python dependencies on the virtualenv and updates them when
 necessary. Poetry has a crucial role in the implementation of AFLuent since it's
 used to make the development process simpler, its role also goes beyond
@@ -43,10 +43,10 @@ \subsection{Coverage.py}
 \label{subsec:coverage}
 
 Spectrum-based fault localization requires data on code coverage and test
-results in order to calculate and rank suspicious of elements in the code.
+results in order to calculate and rank the suspicious elements in the code.
 Coverage.py\cite{coverage_py_website} is a Python tool that provides an easy to use application
 programming interface to collect that data. The tool also provides various
-configuration for the user to skip certain files or directories from being
+configurations for the user to skip certain files or directories from being
 considered. AFLuent relies on this tool to calculate what's known as per-test
 coverage. This data describes the lines of code covered by a single test case
 and organized in an accessible way to find out the number of passing and failing
@@ -69,7 +69,7 @@ \subsection{Radon}
 \subsection{Libcst}
 \label{subsec:libcst}
 
-In order to provide additional methods to breaking ties between element
+In order to provide additional methods to break ties between element
 rankings, Libcst is used to create an abstract syntax tree of the code in
 question. This approach allows AFLuent to detect error prone syntax and
 formulate a score to use to break ties between lines if the need arises.
@@ -92,7 +92,7 @@ \section{AFLuent as a Pytest Plugin}
 packaged hooks that fit in the workflow of Pytest. AFLuent makes use of five
 different hooks to implement automated fault localization.
 Figure \ref{fig:pytest_flow} shows a general overview of the steps changed in the
-workflow of Pytest. Additionally, the section below describes in details how each
+workflow of Pytest. Additionally, the section below describes in detail how each
 step was modified.
 
 \section{Installing AFLuent}
@@ -103,7 +103,7 @@ \section{Installing AFLuent}
 complications. To achieve that, AFLuent is published to the Python Package Index
 (PyPI), which makes it installable through the \code{pip install afluent}
 command. Once this command runs successfully, AFLuent is automatically
-integrated with Pytest as a plugin and will be ran with every pytest session
+integrated with Pytest as a plugin and will be run with every pytest session
 when the user specifies. AFLuent's dependency on Coverage.py
 creates a small but avoidable conflict. In the case that AFluent and another plugin that
 utilizes Coverage.py is active in the same Pytest session, various errors might
@@ -119,7 +119,7 @@ \subsection{Adding Command-Line Arguments}
 \label{subsec:pytest_cli}
 
 Pytest already supports a multitude of command line arguments that allow the
-user to pass configuration that change how the test suite is executed and
+user to pass configurations that change how the test suite is executed and
 reported. Similarly, AFLuent requires user passed arguments to complete a
 variety of tasks. The hook \code{pytest\_addoption} allows adding new
 arguments in a fashion similar to the \code{argparse} Python library.
@@ -156,7 +156,7 @@ \subsection{Adding Command-Line Arguments}
     \end{itemize}
     \item The types of file reports to create after the Pytest session is over
     \begin{itemize}
-        \item \code{---report}: accepts \code{json} or \code{csv} and generate
+        \item \code{---report}: accepts \code{json} or \code{csv} and generates
         reports with the passed format.
         \item \code{---per-test-report}: requires that a per-test coverage
         report is produced. This report is only generated in JSON format.
@@ -174,7 +174,7 @@ \subsection{Adding Command-Line Arguments}
 \subsection{Activating AFLuent}
 \label{subsec:activate_afluent}
 
-After arguments are passed, the next steps parses through some of them to check
+After arguments are passed, the next steps parse through some of them to check
 if AFLuent was enabled and to validate some of their values. The Pytest hook
 \code{pytest\_cmdline\_main} gives access to the collected configuration. In
 this hook, checks are conducted to see if there are other active plugins that
@@ -195,7 +195,7 @@ \subsection{Calculating Per-test Coverage and Test Result}
 
 \begin{enumerate}
     \item \code{cov.start()} begins recording coverage
-    \item the hook yields back control to Pytest which calls the individual test case
+    \item The hook yields back control to Pytest which calls the individual test case
     \item \code{cov.stop()} stops recording coverage
     \item The collected data is then organized in a simpler structure defined as
     the program spectra
@@ -212,7 +212,7 @@ \subsection{Reporting Results}
 
 The last step in AFLuent execution as part of Pytest is to report the fault
 localization outcome. The \code{pytest\_sessionfinish} hook is used to detect the
-exit code of the session and display output on the console accordingly. An exist
+exit code of the session and display output on the console accordingly. An exit
 code of 0 means that all tests have passed and there is no need to perform fault
 localization, therefore, a message would display that to the user before
 finishing the Pytest run. On the other hand an exit code of 1, would indicate
@@ -268,7 +268,7 @@ \subsubsection{Division by Zero: Tarantula}
 to acknowledge that the outcomes reached here only apply to code that has been
 covered by the test suite. Faulty lines, which are not covered by any test case
 will not be investigated since there is no data to calculate their
-suspiciousness. Using values plugged in to Figure \ref{fig:tarantulaEquation}, the
+suspiciousness. Using values plugged into Figure \ref{fig:tarantulaEquation}, the
 examples in Figure \ref{fig:taran_div_by_zero_1} and Figure \ref{fig:taran_div_by_zero_2} show
 the two possible cases where division by zero might occur in the Tarantula
 equation.
@@ -293,16 +293,16 @@ \subsubsection{Division by Zero: Tarantula}
 	\end{center}
 \end{figure}
 
-Figure \ref{fig:taran_div_by_zero_1} shows the case where there is a total of zero
+Figure \ref{fig:taran_div_by_zero_1} shows the case where there are zero total
 failed test cases, covering AND not covering the element, in which a score
 of 0 is assigned to the element. On the other hand,
-Figure \ref{fig:taran_div_by_zero_2} shows the case where there is a total of zero
+Figure \ref{fig:taran_div_by_zero_2} shows the case where there are zero total
 passing test cases, in which a maximum score of 1 is assigned to the element.
 
 \subsubsection{Division by Zero: Ochiai}
 \label{subsubsec:div_by_zero_ochiai}
 
-Division by zero occurs in the Ochiai formula when there is not test coverage
+Division by zero occurs in the Ochiai formula when there is no test coverage
 information for a line or when the total number of failed tests is zero.
 The latter case indicates that the line is not suspicious since it did not
 cause any failures, therefore, zero is returned as the suspiciousness score.
@@ -332,9 +332,9 @@ \subsubsection{Division by Zero: Ochiai}
 	\end{center}
 \end{figure}
 
-Figure \ref{fig:ochiai_div_by_zero_1} is an example of when there is zero total
+Figure \ref{fig:ochiai_div_by_zero_1} is an example of when there are zero total
 failed test cases, resulting in a zero suspiciousness score.
-However Fig\ref{fig:ochiai_div_by_zero_2} shows an example where there is no
+However Figure \ref{fig:ochiai_div_by_zero_2} shows an example where there are no
 failed or successful tests that cover the line. This scenario does not
 occur in AFLuent, which only looks at lines that have some coverage data through
 passing or failing tests. Therefore, it wasn't necessary to handle this possibility.
@@ -419,7 +419,7 @@ \subsubsection{Division by Zero: DStar}
 In the DStar equation, division by zero takes place only in one case. When the
 number of passing test cases that cover the line AND the number of failed test
 cases that do not cover the line are both zero, the denominator evaluates to
-zero. This translates to the following: if there is no passing tests executing
+zero. This translates to the following: if there are no passing tests executing
 this line and no failing test executing other lines only, then this line should
 have the maximum suspiciousness score possible. However, since DStar has a
 numerator raised to a power set by the user, it has no numerical upper limit on
@@ -443,7 +443,7 @@ \subsection{ProjFile Object}
 
 ProjFile objects are designed to contain attributes that describe whole files.
 Additionally, they support functionality that apply to these files. The most
-important attributes of these objects is the \code{lines} instance variable,
+important attribute of these objects is the \code{lines} instance variable,
 which stores a dictionary of contents of the file. Specifically, the keys in
 this dictionary are line numbers in the file and the values are the Line objects
 discussed previously. In addition to storing this data, ProjFile implements an
@@ -492,11 +492,11 @@ \subsection{Objects Overview}
 	\end{center}
 \end{figure}
 
-Fig\ref{fig:oop_structure} provides a visual simplification of the different
+Fig \ref{fig:oop_structure} provides a visual simplification of the different
 components of AFluent and an overview of their roles in the functioning of the
 tool. Overall the nested structure creates several layers that facilitate
 development by isolating the different components and hiding unnecessary
-information from other objects in the hierarchy. By following this structures,
+information from other objects in the hierarchy. By following these structures,
 unit tests can be written much easier and debugging becomes a simpler task.
 
 \section{AFLuent's Output}
@@ -522,10 +522,10 @@ \subsubsection{Success Messages}
 \label{subsubsec:success_message}
 
 This message is produced in the case that the test suite passes with no
-error or failures. Using bright green highlighted message with bold white
+error or failures. Using bright green highlighted text with bold white
 letters, the message displays: \code{All tests passed, no need to diagnose using
-AFLuent}. Fig\ref{fig:success_message} demonstrates the success message output
-when AFLuent is ran on the project's test suite.
+AFLuent}. Figure \ref{fig:success_message} demonstrates the success message output
+when AFLuent is run on the project's test suite.
 
 \begin{figure}[!htb]
 	\begin{center}
@@ -580,7 +580,7 @@ \subsubsection{Warning Messages}
 Python environment but not enabled by the user through the \code{---afl} or
 \code{---afl-debug} flags. This message serves as a reminder to the user to
 enable the plugin if they're interested in utilizing fault localization. The
-message is shown in Fig\ref{fig:warning_message_2}.
+message is shown in Fig \ref{fig:warning_message_2}.
 
 \begin{figure}[!htb]
 	\begin{center}
@@ -645,7 +645,7 @@ \subsection{Console Report}
 suspiciousness score possible (usually 1), the color indicates that these
 elements are extremely likely to be the ones causing the fault. For the
 remaining results, the top 20\% are highlighted using orange to show that they
-are risky of being faulty. The remaining non-zero elements are highlighted using
+are at risk of being faulty. The remaining non-zero elements are highlighted using
 yellow. Figure \ref{fig:report_2} shows an example of how safe statements are
 displayed in the report.
 
@@ -685,7 +685,7 @@ \subsection{Random}
 \label{subsec:tiebreak_random}
 
 Random tie breaking is the baseline approach for dealing with ties in
-suspiciousness scores. It's the method to compare other approaches to in order
+suspiciousness scores. Other approaches are compared to random tie breaking to in order
 to detect if there are any improvements. In random tie breaking, statements are
 ranked in a descending order by the chosen suspiciousness score first, however,
 the order between tied elements is random. This could lead to different rankings
@@ -698,7 +698,7 @@ \subsection{Cyclomatic Complexity}
 complexity as a secondary score to consider when sorting. This score is proposed
 by McCabe \cite{cyclomatic_complexity} and can be easily calculated using the Radon library.
 It measures the number of available paths that execution
-cold go through in a function. Since this type of score only applies to whole
+could go through in a function. Since this type of score only applies to whole
 functions and not to individual elements, lines inherit the cyclomatic
 complexity score of the function they live in when being ranked.Essentially, if
 a function has a cyclomatic complexity score of 7, then all statements within
@@ -714,7 +714,7 @@ \subsection{Mutant Density: Logical Set}
 density \cite{Parsai_2020}. More specifically, this score indicates how error prone the statement
 is by calculating the number of all possible mutants. For example, a statement
 that has many mathematical and logical operators to perform a calculation is
-more error prone that a statement that only has one or two of these operations.
+more error prone than a statement that only has one or two of these operations.
 With that in mind, AFLuent uses this information to break ties between
 statements that have the same suspiciousness score but are syntactically
 different.
@@ -756,22 +756,22 @@ \subsection{Mutant Density: Logical Set}
 \subsection{Mutant Density: Enhanced Set}
 \label{subsec:tiebreak_mutant_density_enhanced}
 
-This approach to tie breaking also uses mutant density evaluate how error prone
+This approach to tie breaking also uses mutant density to evaluate how error prone
 a statement is. However, it seeks to provide a more holistic metric that also
-considers how error prone the constructs that an a statement is nested in. For
+considers how error prone the constructs that a statement is nested in. For
 example, a statement inside a multi-level if statement, which is also nested in
 a loop, is more error prone than a statement which is outside these constructs.
 Since there is more room for errors in loops and if statement conditions, a
 statement nested in them takes on this risk of error. In order to measure this
-score, the list of mutant used in the logical set was extended to include
+score, the list of mutants used in the logical set was extended to include
 additional constructs that might contain the error. Additionally, the tiebreaker
 looks through and scores each block that the statement is nested in.
-Table\ref{table:enhanced_set_mutants} shows the additional mutants that are
-looked for in the enhanced set. Additionally, Table\ref{table:construct_scoring}
+Table \ref{table:enhanced_set_mutants} shows the additional mutants that are
+looked for in the enhanced set. Additionally, Table \ref{table:construct_scoring}
 discusses how a score is assigned to each construct while parsing through the
 syntax tree and assessing how error prone a statement is. All of these
 approaches are used to calculate a score that gets used in breaking ties of
-suspicious statements while ranking. Figure\ref{fig:enhanced_score_equation}
+suspicious statements while ranking. Figure \ref{fig:enhanced_score_equation}
 shows how each construct score is used in generating the final score of a statement.
 
 \begin{table}[!htb]
@@ -864,7 +864,7 @@ \subsection{Tiebreaking Overview}
 
 Using all the tie breaking approaches discussed previously, this subsection
 provides an overview and an example that demonstrates tie breaking on a sample
-program. Table\ref{table:scoring_examples} shows a sample program that implement
+program. Table \ref{table:scoring_examples} shows a sample program that implements
 two functions with some conditional logic and simple mathematical operations. It also
 contains three columns with each scoring approach. Starting with the cyclomatic
 scores, one can see that all statements in a function contain the same score.
diff --git a/chapters/ch04_experiments.tex b/chapters/ch04_experiments.tex
index a7d480d..616866d 100755
--- a/chapters/ch04_experiments.tex
+++ b/chapters/ch04_experiments.tex
@@ -14,7 +14,7 @@ \subsection{Approach Overview}
 
 In order to evaluate AFLuent, several prerequisites are needed that enable
 collecting data for analysis. The primary requirement is a collection of Python
-programs, which are susceptible of becoming faulty. Additionally, this
+programs, which are susceptible to becoming faulty. Additionally, this
 collection's complexity must be comparable to code typically written by novice
 developers, which AFLuent targets. Another crucial requirement before evaluation
 can begin is a test suite for the selected code python code. The test suite must
@@ -40,8 +40,8 @@ \subsection{Research Questions}
 \label{subsec:research_questions_eval}
 
 Before discussing the evaluation process, it's important to clearly state the
-questions to answer. Previously, Section\ref{sec:researchq} brought up few
-research question concerning the implementation and evaluation of AFL in Python.
+questions to answer. Previously, Section \ref{sec:researchq} brought up a few
+research questions concerning the implementation and evaluation of AFL in Python.
 Related Work and Methods section addressed \hyperref[para:RQ1.1]{\emph{RQ1.1}} as well as
 \hyperref[para:RQ2]{\emph{RQ2}}, however, the answer \hyperref[para:RQ1.2]{\emph{RQ1.2}} remains unclear. While \hyperref[para:RQ1.2]{\emph{RQ1.2}}
 generally involved the efficiency and accuracy of AFLuent, there was no mention
@@ -50,7 +50,7 @@ \subsection{Research Questions}
 equation used, Tarantula, Ochiai, Ochiai2, and Dstar, there are four tie breaking
 approaches. Each equation-tiebreaker pair will be evaluated on the same dataset,
 where the resulting rankings will be used to produce a score to assess how
-close the produced ranking are to localizing the fault correctly. In addition to
+close the produced rankings are to localizing the fault correctly. In addition to
 this score, the time taken to run each approach will be recorded to compare
 their time overhead.
 
@@ -89,7 +89,7 @@ \subsubsection{Filtering Sample}
     expedite generating a test suite using Pynguin, several projects that use
     external packages were removed. These packages include \code{scikit-learn},
     \code{Tensorflow}, \code{Matplotlib}, \code{Sympy}, and \code{PIL}.
-    Generating data and creating unit tests for projects that using theses
+    Generating data and creating unit tests for projects that use these
     packages can be difficult because they are time consuming or simply cannot
     be tested due to their graphical output.
     \item Functions that do not take input or return no results: some implemented
@@ -105,7 +105,7 @@ \subsubsection{Filtering Sample}
     perform. Therefore, these functions were removed. In some instances where
     the documentation explicitly stated the type of input for the function, type
     hints were manually added to avoid the removal of the function. This was
-    especially frequent in sorting function, in which the input was specified as
+    especially frequent in the sorting function, in which the input was specified as
     integer values.
     \item Code snippets under \code{if \_\_name\_\_ == "\_\_main\_\_"}: In most
     instances the code under this if statement either ran the doctest tests, or
@@ -136,9 +136,9 @@ \subsubsection{Filtering Sample}
 Following the initial phase of filtering the codebase, general statistics and
 observations were recorded for the remaining sample thus far.
 Table\ref{table:remaining_projects} shows the name of the remaining project.
-Additionally, SLOCcount was used to calculate the number of non-comment line of
+Additionally, SLOCcount was used to calculate the number of non-comment lines of
 code included in the sample. The output from the tool is shown in
-Fig\ref{fig:SLOCcount_phase1}. Overall, there was 12199 lines of code remaining
+Figure \ref{fig:SLOCcount_phase1}. Overall, there was 12199 lines of code remaining
 in the sample prior to automatically generating tests using Pynguin.
 
 \begin{figure}[!htb]
@@ -186,7 +186,7 @@ \subsubsection{Filtering Sample}
 \subsubsection{Generating a Test Suite}
 \label{subsubsec:generating_test_suite}
 
-Once initial filtering of the codebase was completed, Pynguin was ran to
+Once initial filtering of the codebase was completed, Pynguin was run to
 generate tests. However, additional issues came up in this process that required
 additional filtering to be done. The new content was filtered as follows:
 \begin{itemize}
@@ -197,7 +197,7 @@ \subsubsection{Generating a Test Suite}
     getting timed out. Finally, there were 249 modules where tests were
     successfully generated. Modules with failed and timed out runs were removed
     due to the lack of tests that cover them.
-    \item Some generated tests were faulty and caused errors when ran, or
+    \item Some generated tests were faulty and caused errors when run, or
     indeterminately failed making them flaky. Those tests were removed in some
     instances or fixed when possible.
     \item Since AFLuent relies on a thorough test suite with high coverage in
@@ -245,7 +245,7 @@ \subsubsection{Generating a Test Suite}
 	\begin{center}
 		\includegraphics[width=15.5cm]{cyclomatic_complexity.png}
 		\caption{\label{fig:cyclomatic_complexity_of_sample} Cyclomatic
-		Complextiy of Sample}
+		Complexity of Sample}
 	\end{center}
 \end{figure}
 
@@ -254,7 +254,7 @@ \subsubsection{Generating a Test Suite}
 Figure\ref{fig:SLOCcount_phase2}. While the codebase was significantly reduced in
 size, the resulting test suite contains 1105 test cases with 99\% coverage.
 In order to get a better understanding of the remaining sample, additional data
-was collected on the cyclomatic complexity the functions in the code. This
+was collected on the cyclomatic complexity of the functions in the code. This
 information would give some ideas on the structure of the code regarding if
 statements, loops and other constructs. The cyclomatic complexity of the
 remaining 516 functions was calculated and plotted as shown on Figure
@@ -279,7 +279,7 @@ \subsection{Data Collection}
 an automated approach to collect this data. Some existing tools such as
 \emph{mutmut} \cite{mutmut} already perform similar steps for
 mutation testing and it could be easily repurposed to generate the bugs/mutants and run
-AFLuent after inserting a mutant into the codebase. Futhermore, \emph{mutmut}
+AFLuent after inserting a mutant into the codebase. Furthermore, \emph{mutmut}
 supports a hook function that facilitates collecting results after each run and
 before the next mutant is applied. Lastly the test suite run command can be
 modified to run AFLuent in evaluation mode and collect fault localization data.
@@ -290,7 +290,7 @@ \subsection{Data Collection}
     approach-tiebreaker combination. Note that a value of 3 was used for \code{*} in
     the DStar equation.
     \item Per-test coverage report
-    \item Timing report containing the time taken to run the test suite befor
+    \item Timing report containing the time taken to run the test suite before
     fault localization and the time to perform fault localization and get
     rankings.
     \item Information about the mutant such as the file and line number it was
@@ -344,7 +344,7 @@ \subsection{Results}
 only include lines from that project. For example, when the \emph{EXAM}
 score is calculated for a fault in the \code{bit\_manipulation} project, the
 percentage of lines belonging to that project that a developer must read before
-finding the fault is the score. The lower the exam score for an approach, to
+finding the fault is the score. The lower the exam score for an approach, the
 more effective it is because the developer would have to analyze less lines
 before finding the fault.
 
@@ -355,7 +355,7 @@ \subsection{Results}
 Additionally, heat maps are used to compare each pair to the other fifteen after
 completing statistical tests.
 
-\subsubsection{Data Vizualization}
+\subsubsection{Data Visualization}
 \label{subsubsec:data_vizualization}
 
 To compare suspiciousness score equations when the tie breaking approach is the
@@ -422,7 +422,7 @@ \subsubsection{Data Vizualization}
 so. But, further statistical analysis is needed to come to that conclusion.
 
 When analyzing Figure \ref{fig:logical_tiebreak_boxplot}, there is a noticeable
-visual difference to the previous plots. Specifically, the the median
+visual difference to the previous plots. Specifically, the median
 \emph{EXAM} score for all approaches using logical tie breaking is smaller when
 compared to random and cyclomatic tie breaking. While this implies an
 improvement, it's still unclear if it's statistically significant. In addition
@@ -433,7 +433,7 @@ \subsubsection{Data Vizualization}
 still appears to be the worst performing while DStar has the lower scores
 overall.
 
-Lastly, \emph{EXAM} score while using enhanced tie breaking are shown in Figure
+Lastly, \emph{EXAM} scores while using enhanced tie breaking are shown in Figure
 \ref{fig:enhanced_tiebreak_boxplot}. When comparing the results of this figure
 to those in Figure \ref{fig:logical_tiebreak_boxplot}, the median values as well
 as maximums appear to be higher. This indicates that enhanced tie breaking might
@@ -452,7 +452,7 @@ \subsubsection{Data Vizualization}
 In addition to the box plots that showcase the medians, and the quartiles, it's
 worth looking at the average exam score for each approach. Figure
 \ref{fig:averages_barplot} plots the \emph{EXAM} score averages for all
-categories. This barplot further demonstrates the low performance of Tarantula,
+categories and lists the value of the mean at the top of each bar. This bar plot further demonstrates the low performance of Tarantula,
 in which it has the highest averages out of all other equations. Based on the
 averages in the plot, the best performing approach is DStar with logical tie
 breaking with an average of 18.27\% \emph{EXAM} score. It's followed by Ochiai
@@ -492,8 +492,9 @@ \subsubsection{Statistical Tests: Mann-Whitney}
 statistically different. However, it does not conclude which approach had the
 higher or lower exam scores. Figure \ref{fig:two_sided_mw_test} plots a heat map
 of the resulting \emph{p-value}s from the two sided Mann-Whitney test. It compares each
-approach to the fifteen remaining ones.
-
+approach to the fifteen remaining ones. \emph{P-value}s for each comparison are
+included in each square, where squares with lower \emph{p-value} are
+darker color than those with larger value.
 \begin{figure}[!htb]
 	\begin{center}
 		\includegraphics[width=15.5cm]{two_sided_mw_test.png}
@@ -508,9 +509,9 @@ \subsubsection{Statistical Tests: Mann-Whitney}
 show that these sets are different from all non-Tarantula approaches to a very
 statistically significant level.
 Therefore, the null hypothesis is rejected
-here. There are only few exceptions, where Tarantula Enhanced and Tarantula
+here. There are only a few exceptions, where Tarantula Enhanced and Tarantula
 Logical are not different from the random and cyclomatic variants of Ochiai,
-Ochiai2, and DStar, so the null hypotheses is accepted for these exceptions.
+Ochiai2, and DStar, so the null hypothesis is accepted for these exceptions.
 These cases represent an interesting case where using the logical and enhanced
 tiebreakers was a significant enough improvement to bring Tarantula in line with
 the random and cyclomatic variants of other equations.
@@ -530,7 +531,7 @@ \subsubsection{Statistical Tests: Mann-Whitney}
 level. Therefore, the null hypothesis is accepted for these combinations.
 
 While the results of this test are helpful to show if statistically significant
-differences exists between the different approaches, it does not clarify which
+differences exist between the different approaches, it does not clarify which
 one is better than the other. To have a better understanding of that
 relationship, a one sided \code{less} Mann-Whitney test is performed with the
 hypotheses are as follows:
@@ -549,7 +550,7 @@ \subsubsection{Statistical Tests: Mann-Whitney}
 In contrast to the two-sided test, this one gives a more clear idea on which
 approaches has a lower \emph{EXAM} scores when compared to others. A \emph{p-value}
 less than or equal to 0.05 suggests that the approach on the x-axis is
-significantly more effective than it's counterpart on the y-axis. On the other
+significantly more effective than its counterpart on the y-axis. On the other
 hand, a \emph{p-value} greater than 0.95 indicates the opposite, where the
 approach on the y-axis is significantly more effective than the one on the x-axis. Figure
 \ref{fig:one_sided_mw_test} plots a heat map of the resulting \emph{p-value}s
@@ -573,13 +574,13 @@ \subsubsection{Statistical Tests: Mann-Whitney}
 by others, however, it was less frequently.
 
 As for other equations, the Random and Cyclomatic variants of Ochiai, Ochiai2,
-and DStar all had similar results between each other, where non of them
+and DStar all had similar results between each other, where none of them
 outperformed another on a significant level. Therefore, for these approaches,
 the null hypotheses is accepted. This result matches the outcome of the
 two-sided test where these approaches were very similar.
 
 Some notable large differences in performance can be seen in the logical variant
-of Ochiai, Ochiai2, and DStar. Theses three significantly outperformed every
+of Ochiai, Ochiai2, and DStar. These three significantly outperformed every
 random and cyclomatic variant of all other equations. On the other hand, the
 enhanced variant of these slightly outperformed the cyclomatic and random ones,
 but not on a statistically significant degree. This result suggests that the
@@ -590,14 +591,14 @@ \subsubsection{Statistical Tests: Mann-Whitney}
 
 One of the surprising outcomes of this data is the performance of enhanced tie
 breaking. Despite the fact that it adds on the mutant density metric provided by
-logical, and considers wider possibility while generating it's score, it did not
+logical, and considers wider possibility while generating its score, it did not
 outperform the logical tie breaker. In fact, the \emph{p-value} was leaning more
 to the other outcome, but not in any significant way.
 
 \subsubsection{Statistical Tests: Cohen's D Effect Size}
 \label{subsubsec:statistical_test_cohen}
 
-While the Mann-Whitney test checks if the distributions of the approaches is
+While the Mann-Whitney test checks if the distribution of the approaches is
 different to others on a statistically significant level, it does not give an
 idea of the magnitude of this difference. In order to get that information, the
 non-parametric Cohen's D effect size is calculated for each pair of approaches
@@ -624,8 +625,8 @@ \subsubsection{Statistical Tests: Cohen's D Effect Size}
 Ochiai2 logical with DStar logical, the values 0.0013 and 0.0021 respectively
 show that DStar had an improvement over the other two. This improvement is very
 small and, as established in the previous test, not statistically significant.
-As for the comparison between Ochiai logical and Ochiai2 logical, and even
-smaller value indicate a very small difference in favor of Ochiai logical.
+As for the comparison between Ochiai logical and Ochiai2 logical, an even
+smaller value indicates a very small difference in favor of Ochiai logical.
 Again, this difference is not statistically significant to reach a conclusion of
 which is the better approach. Overall, the performed statistical tests allowed
 the filtering of 16 different approaches to determine the top 3 ones that
@@ -639,7 +640,7 @@ \subsubsection{Time Efficiency}
 execute tests without AFLuent, (II) Time to execute tests with AFLuent enabled,
 (III) Time to locate faults and perform tie breaking using all
 equation-tiebreaker combinations. Generally all these times were calculated when
-all 1105 test cases in the project's suite were ran.
+all 1105 test cases in the project's suite were run.
 
 \begin{figure}[!htb]
 	\begin{center}
@@ -650,7 +651,7 @@ \subsubsection{Time Efficiency}
 
 Figure \ref{fig:test_timings} compares the time taken to execute all test
 functions with and without using AFLuent. The baseline includes
-4000 data points where the tests were ran without AFLuent. On the other hand
+4000 data points where the tests were run without AFLuent. On the other hand
 7613 data points are plotted for when AFLuent generated a timing report. It's
 important to mention that every point in the boxplot on the right represents the
 combined time to run all equation-tiebreaker combinations after one another.
@@ -676,8 +677,8 @@ \subsubsection{Time Efficiency}
 	\end{center}
 \end{figure}
 
-Another useful time metric for developers hoping to use AFLuent is the time the
-took takes to locate faults. And while
+Another useful time metric for developers hoping to use AFLuent is the time it
+took to locate faults. And while
 comprehensive time data for each equation and tie breaking approach was not
 collected, some conclusions can be drawn from the most time consuming case.
 Figure \ref{fig:localization_timings} shows the time taken to generate a
@@ -685,8 +686,8 @@ \subsubsection{Time Efficiency}
 suspiciousness scores using all equations and to retrieve and evaluate tie
 breaking values using all approaches. In general, this is the worst possible
 scenario for AFLuent localization time. While the median time is quite large,
-around 115 second, so is the test suite. For every run, AFLuent generates and parses
-abstract syntax trees using libcst and calculates cyclomatic complexity fo all
+around 115 second, so is the test suite. For every run, AFLuent generates and parse
+abstract syntax trees using LibCST  and calculates cyclomatic complexity fo all
 files covered in the suite. In the case of this evaluation, this includes all
 files in the codebase. Further discussion on how this time can be minimize is
 found in Section \ref{sec:future_work}: Future Work.
diff --git a/chapters/conclusion.tex b/chapters/conclusion.tex
index 6fac222..5f5adff 100755
--- a/chapters/conclusion.tex
+++ b/chapters/conclusion.tex
@@ -15,7 +15,7 @@ \section{Summary of Results}
 best and worst performances. One observation that stood out
 is the significant underperformance of Tarantula when compared to all other
 equations. Additionally, the statistically significant edge that logical tie
-breaking achieve to outperform it's random and cyclomatic counterparts was
+breaking achieves to outperform its random and cyclomatic counterparts was
 surprising. Overall, Ochiai, Ochiai2 and DStar had very strong performances
 when the same tie breaker was used across all of them, however, the data did not
 suggest that one outperformed the other significantly. Additional experiments may
@@ -58,16 +58,16 @@ \subsubsection{Effectiveness}
 \subsubsection{Efficiency}
 
 One of the concerning outcomes of this study is the long time AFLuent takes to
-produce fault localization output. Developer usually want fast and optimized
+produce fault localization output. Developers usually want fast and optimized
 tools, which could render AFLuent unusable in the eyes of many. And while
 AFLuent's reliance on several tools reduce the ability to control the time it
 takes to run, there are some measures that could mitigate this problem.
 Throughout manual testing of the tool, it was generally observed that the most
 time consuming feature of AFLuent involves generating the tie breaker datasets.
-In oder for AFLuent to adequately understand the code under test, it must
-generate abstract syntax tree for every file covered in the test suite. Since
+In order for AFLuent to adequately understand the code under test, it must
+generate an abstract syntax tree for every file covered in the test suite. Since
 the debugging process usually involves making small changes at a time and
-rerunning the tests, a time improvement is possible be caching all generated
+rerunning the tests, a time improvement is possible by caching all generated
 syntax trees and only re-generating ones for files that have been edited since
 the last run. This solution will not reduce the runtime for the first time but
 it could have a significant effect on the runs that follow. Overall, this change
@@ -104,7 +104,7 @@ \subsubsection{Evaluation}
         \item Students can be split into groups to complete different assignments
         \item Chosen assignments should allow for test driven development
     \end{itemize}
-    \item Allow some groups of students to use AFLuent and assess it's
+    \item Allow some groups of students to use AFLuent and assess its
     effectiveness in helping them locate faults
     \item Collect direct feedback regarding the students' experience while using AFLuent
 \end{enumerate}
@@ -131,7 +131,7 @@ \section{Ethical Implications}
 negatively impact their development skills by taking away the experience of
 manually analyzing the code and understanding its expected behavior and why
 failures occur. This could be especially harmful if AFLuent was overused in
-educational environments since students could utilize it's functionality without
+educational environments since students could utilize its functionality without
 fully understanding how to fix the code themselves, and thus negatively impact
 their learning process.
 
diff --git a/preamble/bibliography.bib b/preamble/bibliography.bib
index d05aaa0..ce47482 100755
--- a/preamble/bibliography.bib
+++ b/preamble/bibliography.bib
@@ -259,4 +259,21 @@ @inproceedings{Parsai_2020
 	author = {Ali Parsai and Serge Demeyer},
 	title = {Mutant Density},
 	booktitle = {Proceedings of the {IEEE}/{ACM} 42nd International Conference on Software Engineering Workshops}
+}
+
+@inproceedings{parnin,
+author = {Parnin, Chris and Orso, Alessandro},
+title = {Are Automated Debugging Techniques Actually Helping Programmers?},
+year = {2011},
+isbn = {9781450305624},
+publisher = {Association for Computing Machinery},
+address = {New York, NY, USA},
+url = {https://doi.org/10.1145/2001420.2001445},
+doi = {10.1145/2001420.2001445},
+booktitle = {Proceedings of the 2011 International Symposium on Software Testing and Analysis},
+pages = {199-209},
+numpages = {11},
+keywords = {user studies, statistical debugging},
+location = {Toronto, Ontario, Canada},
+series = {ISSTA '11}
 }
\ No newline at end of file