VisualizingLargeModels.tex

\chapter{Visualizing Large Models}
\label{chap:VisualizingLargeModels}

ParaView is used frequently at Sandia National Laboratories and other
institutions for visualizing data from large-scale simulations run on the
world's largest supercomputers including the examples shown here.

\begin{inlinefig}
  \begin{tabular}{cc}
    \includegraphics[width=0.431\linewidth]{images/Asteroid} &
    \includegraphics[width=0.469\linewidth]{images/PolarVortex} \\
    \parbox[t]{0.431\linewidth}{\footnotesize CTH shock physics simulation
      with over 1 billion cells of a 10 megaton explosion detonated at the
      center of the Golevka asteroid.} &
    \parbox[t]{0.469\linewidth}{\footnotesize SEAM Climate Modeling
      simulation with 1 billion cells modeling the breakdown of the polar
      vortex, a circumpolar jet that traps polar air at high latitudes.}
  \end{tabular}

  \begin{tabular}{cc}
    \includegraphics[width=0.474\linewidth]{images/LargeAMR} &
    \includegraphics[width=0.426\linewidth]{images/Crossflow} \\
    \parbox[t]{0.474\linewidth}{\footnotesize A CTH simulation that
      generates AMR data.  We have used ParaView to visualize CTH
      simulation AMR data comprising billions of cells, 100's of thousands
      of blocks, and eleven levels of hierarchy (not shown).} &
    \parbox[t]{0.426\linewidth}{\footnotesize A PHASTA simulation of 3.3
      billion tetrahedral cells involving the flow over a full wing where a
    synthetic jet issues an unsteady crossflow jet.}
  \end{tabular}

  \begin{tabular}{cc}
    \includegraphics[width=0.383\linewidth]{images/Crossflow} &
    \includegraphics[width=0.517\linewidth]{images/WingWake}
  \end{tabular}
  \parbox{0.95\linewidth}{\footnotesize ParaView visualizations run in situ
    with large scale PHASTA simulations. On the left is a 3.3 billion
    tetrahedral mesh simulating the flow over a full wing where a synthetic
    jet issues an unsteady crossflow jet (run on 160 thousand MPI
    processes). On the right is a 1.3 billion element mesh simulating the
    wake of a deflected wing flap (run on 256 thousand MPI processes).
    Images courtesy of Michel Rasquin, Argonne National Laboratory.}
\end{inlinefig}

In this section we discuss visualizing large meshes like these using the
parallel visualization capabilities of ParaView.  This section is less
``hands-on'' than the previous section.  You will learn the conceptual
knowledge needed to perform large parallel visualization instead.  We
present the basic ParaView architecture and parallel algorithms and
demonstrate how to apply this knowledge.


\section{ParaView Architecture}

ParaView is designed as a three-tier client-server architecture.  The three
logical units of ParaView are as follows.

\index{ParaView Server}
\begin{description}
\item[Data Server] \index{data server} The unit responsible for data
  reading, filtering, and writing.  All of the pipeline objects seen in the
  pipeline browser are contained in the data server.  The data server can
  be parallel.
\item[Render Server] \index{render server}The unit responsible for
  rendering.  The render server can also be parallel, in which case built
  in parallel rendering is also enabled.
\item[Client] \index{client}The unit responsible for establishing
  visualization.  The client controls the object creation, execution, and
  destruction in the servers, but does not contain any of the data (thus
  allowing the servers to scale without bottlenecking on the client).  If
  there is a GUI, that is also in the client.  The client is always a
  serial application.
\end{description}

These logical units need not be physically separated.  Logical units are
often embedded in the same application, removing the need for any
communication between them.  There are three modes in which you can run
ParaView.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/RunModeStandalone}
\end{inlinefig}

The first mode, which you are already familiar with, is
\keyterm{standalone} mode.  In standalone mode, the client, data server,
and render server are all combined into a single serial application.  When
you run the \progname{paraview} application, you are automatically connected
to a \keyterm{builtin} server so that you are ready to use the full
features of ParaView.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/RunModeClientServer}
\end{inlinefig}

The second mode is \keyterm{client-server} mode.  In client-server mode,
you execute the \progname{pvserver} program on a parallel machine and
connect to it with the \progname{paraview} client application.  The
\progname{pvserver} program has both the data server and render server
embedded in it, so both data processing and rendering take place there.
The client and server are connected via a socket, which is assumed to be a
relatively slow mode of communication, so data transfer over this socket is
minimized.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/RunModeClientRenderDataServer}
\end{inlinefig}

The third mode is \keyterm{client--render-server--data-server} mode.  In this
mode, all three logical units are running in separate programs.  As before,
the client is connected to the render server via a single socket
connection.  The render server and data server are connected by many socket
connections, one for each process in the render server.  Data transfer over
the sockets is minimized.

Although the client-render server-data server mode is supported, we almost
never recommend using it.  The original intention of this mode is to take
advantage of heterogeneous environments where one might have a large,
powerful computational platform and a second smaller parallel machine with
graphics hardware in it.  However, in practice we find any benefit is
almost always outstripped by the time it takes to move geometry from the
data server to the render server.  If the computational platform is much
bigger than the graphics cluster, then use software rendering on the large
computational platform.  If the two platforms are about the same size just
perform all the computation on the graphics cluster.

\section{Setting up a ParaView Server}

Setting up standalone ParaView is usually trivial.  You can download a
pre-compiled binary, install it on your computer, and go.  Setting up a
ParaView server, however, is intrinsically harder.  First, you will have to
compile the server yourself.  Because there are so many versions of MPI,
the library that makes parallel programming possible, and each version of
MPI may be altered to match the communication hardware of a parallel
computer, it is impossible to reliably provide binary files to match every
possible combination.

To compile ParaView on a parallel machine, you will need the following.

\begin{itemize}
\item CMake cross-platform build setup tool
  (\href{http://www.cmake.org}{www.cmake.org})
\item MPI
\item OpenGL (or use Mesa 3D \href{http://www.mesa3d.org}{www.mesa3d.org}
  if otherwise unavailable)
\item Qt 4.7 (optional)
\item Python +NumPy +Matplotlib (optional)
\end{itemize}

Compiling without one of the optional libraries means a feature will not be
available.  Compiling without Qt means that you will not have the GUI
application and compiling without Python means that you will not have
scripting available.

To compile ParaView, you first run CMake, which will allow you to set up
compilation parameters and point to libraries on your system.  This will
create the make files that you then use to build ParaView.  For more
details on building a ParaView server, see the ParaView Wiki.

{
  \footnotesize
  \href{http://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Compiling}{http://www.paraview.org/Wiki/Setting\_up\_a\_ParaView\_Server\#Compiling}
}

Running ParaView in parallel is also intrinsically more difficult than
running the standalone client.  It typically involves a number of steps
that change depending on the hardware you are running on: logging in to
remote computers, allocating parallel nodes, launching a parallel program,
establishing connections, and tunneling through firewalls.

Client-server connections are established through the \texttt{paraview}
client application.  You connect to servers and disconnect from servers
with the \connect and \disconnect buttons.  When ParaView starts, it
automatically connects to the builtin server.  It also connects to
builtin whenever it disconnects~\disconnect from a server.

When you hit the \connect button, ParaView presents you with a dialog box
containing a list of known servers you may connect to.  This list of
servers can be both site- and user-specific.

\begin{inlinefig}
  \includegraphics[width=.75\scw]{images/ChooseServer}
\end{inlinefig}

You can specify how to connect to a server either through the GUI by
pressing the \gui{Add Server} button or through an XML definition file.
There are several options for specifying server connections, but ultimately
you are giving ParaView a command to launch the server and a host to
connect to after it is launched.  Consult the ParaView Wiki for more
information on establishing server connections.

{
  \footnotesize
  \href{http://www.paraview.org/Wiki/Setting_up_a_ParaView_Server#Running_the_Server}{http://www.paraview.org/Wiki/Setting\_up\_a\_ParaView\_Server\#Running\_the\_Server}
}


\section{Parallel Visualization Algorithms}

We are fortunate in that once you have a parallel framework, performing
parallel visualization tasks is straightforward.  The data we deal with is
contained in a mesh, which means the data is already broken into little
pieces by the cells.  We can do visualization on a distributed parallel
machine by first dividing the cells among the processes.  For
demonstrative purposes, consider this very simplified mesh.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleMesh}
\end{inlinefig}

Now let us say we want to perform visualizations on this mesh using three
processes.  We can divide the cells of the mesh as shown below with the
blue, yellow, and pink regions.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExamplePartitions}
\end{inlinefig}

Once partitioned, some visualization algorithms will work by simply
allowing each process to independently run the algorithm on its local
collection of cells.  For example, take clipping (which is demonstrated in
multiple exercises including \ref{ex:UsingMultipleViews}).  Let us say that
we define a clipping plane and give that same plane to each of the
processes.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleClip1}
\end{inlinefig}

Each process can independently clip its cells with this plane.  The end
result is the same as if we had done the clipping serially.  If we were to
bring the cells together (which we would never actually do for large data
for obvious reasons) we would see that the clipping operation took place
correctly.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleClip2}
\end{inlinefig}


\section{Ghost Levels}

Unfortunately, blindly running visualization algorithms on partitions of
cells does not always result in the correct answer.  As a simple example,
consider the \keyterm{external faces} algorithm.  The external faces
algorithm finds all cell faces that belong to only one cell, thereby
identifying the boundaries of the mesh. What happens when we run external
faces independently on our partitions?

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleExternalFaces1}
\end{inlinefig}

Oops.  We see that when all the processes ran the external faces algorithm
independently, many internal faces were incorrectly identified as being
external.  This happens where a cell in one partition has a neighbor in
another partition.  A process has no access to cells in other partitions,
so there is no way of knowing that these neighboring cells exist.

The solution employed by ParaView and other parallel visualization systems
is to use \keyterm{ghost cells} (sometimes also called
\keyterm{halo regions}).  Ghost cells are cells that are held in one
process but actually belong to another.  To use ghost cells, we first have
to identify all the neighboring cells in each partition.  We then copy
these neighboring cells to the partition and mark them as ghost cells, as
indicated with the gray colored cells in the following example.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleExternalFaces2}
\end{inlinefig}

When we run the external faces algorithm with the ghost cells, we see that
we are still incorrectly identifying some internal faces as external.
However, all of these misclassified faces are on ghost cells, and the faces
inherit the ghost status of the cell it came from.  ParaView then strips
off the ghost faces and we are left with the correct answer.

In this example we have shown one layer of ghost cells: only those cells
that are direct neighbors of the partition's cells.  ParaView also has the
ability to retrieve multiple layers of ghost cells, where each layer
contains the neighbors of the previous layer not already contained in a
lower ghost layer or the original data itself.  This is useful when we have
cascading filters that each require their own layer of ghost cells.  They
each request an additional layer of ghost cells from upstream, and then
remove a layer from the data before sending it downstream.

\section{Data Partitioning}

Since we are breaking up and distributing our data, it is prudent to
address the ramifications of how we partition the data.  The data shown in
the previous example has a \keyterm{spatially coherent} partitioning.  That
is, all the cells of each partition are located in a compact region of
space.  There are other ways to partition data.  For example, you could
have a random partitioning.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleRandomPartition1}
\end{inlinefig}

Random partitioning has some nice features.  It is easy to create and is
friendly to load balancing.  However, a serious problem exists with respect
to ghost cells.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelExampleRandomPartition2}
\end{inlinefig}

In this example, we see that a single level of ghost cells nearly
replicates the entire data set on all processes.  We have thus removed any
advantage we had with parallel processing.  Because ghost cells are used so
frequently, random partitioning is not used in ParaView.

\section{D3 Filter}

The previous section described the importance of load balancing and ghost
levels for parallel visualization.  This section describes how to achieve
that.

Load balancing and ghost cells are handled automatically by ParaView when
you are reading structured data (image data, rectilinear grid, and
structured grid).  The implicit topology makes it easy to break the data
into spatially coherent chunks and identify where neighboring cells are
located.

It is an entirely different matter when you are reading in unstructured
data (poly data and unstructured grid).  There is no implicit topology and
no neighborhood information available.  ParaView is at the mercy of how the
data was written to disk.  Thus, when you read in unstructured data there
is no guarantee about how well load balanced your data will be.  It is also
unlikely that the data will have ghost cells available, which means that
the output of some filters may be incorrect.

Fortunately, ParaView has a filter that will both balance your unstructured
data and create ghost cells.  This filter is called D3, which is short for
distributed data decomposition.  Using D3 is easy; simply attach the filter
(located in \gui{Filters} \ra \gui{Alphabetical} \ra \gui{D3}) to whatever
data you wish to repartition.

\begin{inlinefig}
  \includegraphics[height=.3\linewidth]{images/D3ExampleBefore}
  \includegraphics[height=.3\linewidth]{images/D3ExampleAfter}
\end{inlinefig}

The most common use case for D3 is to attach it directly to your
unstructured grid reader.  Regardless of how well load balanced the incoming
data might be, it is important to be able to retrieve ghost cells so that
subsequent filters will generate the correct data.  The example above shows
a cutaway of the extract surface filter on an unstructured grid.  On the
left we see that there are many faces improperly extracted because we are
missing ghost cells.  On the right the problem is fixed by first using the
D3 filter.


\section{Matching Job Size to Data Size}

\emph{How many cores should I have in my ParaView server?}  This is a
common question with many important ramifications.  It is also an
enormously difficult question.  The answer depends on a wide variety of
factors including what hardware each processor has, how much data is being
processed, what type of data is being processed, what type of visualization
operations are being done, and your own patience.

Consequently, we have no hard answer.  We do however have several rules of thumb.

\textbf{If you are loading structured data} (image data, rectilinear grid,
structured grid), try to have a minimum of one core per 20 million
cells.  If you can spare the cores, one core for every 5 to 10
million cells is usually plenty.

\textbf{If you are loading unstructured data} (poly data, unstructured
grid), try to have a minimum of one core per 1 million cells.  If you
can spare the cores, one core for every 250 to 500 thousand cells
is usually plenty.

As stated before, these are just rules of thumb, not absolutes.  You should
always try to experiment to gage what your core to data size should
be.  And, of course, there will always be times when the data you want to
load will stretch the limit of the resources you have available.  When this
happens, you will want to make sure that you avoid data explosion and that
you cull your data quickly.


\section{Avoiding Data Explosion}
\label{sec:AvoidingDataExplosion}

The pipeline model that ParaView presents is very convenient for
exploratory visualization.  The loose coupling between components provides
a very flexible framework for building unique visualizations, and the
pipeline structure allows you to tweak parameters quickly and easily.

The downside of this coupling is that it can have a larger memory
footprint.  Each stage of this pipeline maintains its own copy of the data.
Whenever possible, ParaView performs \keyterm{shallow copies} of the data
so that different stages of the pipeline point to the same block of data in
memory.  However, any filter that creates new data or changes the values or
topology of the data must allocate new memory for the result.  If ParaView
is filtering a very large mesh, inappropriate use of filters can quickly
deplete all available memory.  Therefore, when visualizing large data sets,
it is important to understand the memory requirements of filters.

Please keep in mind that the following advice is intended \emph{only for
  when dealing with very large amounts of data and the remaining available
  memory is low}.  When you are not in danger of running out of memory,
ignore all of the following advice.

When dealing with structured data, it is absolutely important to know what
filters will change the data to unstructured.  Unstructured data has a much
higher memory footprint, per cell, than structured data because the
topology must be explicitly written out.  There are many filters in
ParaView that will change the topology in some way, and these filters will
write out the data as an unstructured grid, because that is the only data
set that will handle any type of topology that is generated.  The following
list of filters will write out a new unstructured topology in its output
that is roughly equivalent to the input.  These filters should \emph{never}
be used with structured data and should be used with caution on
unstructured data.

%TODO: there are surely more filters in each category now

\ifthenelse{\boolean{savetrees}}{\noindent\begin{minipage}{\linewidth}}{}
\begin{multicols}{2}
  \begin{itemize}
  \item \gui{Append Datasets}
  \item \gui{Append Geometry}
  \item \gui{Clean}
  \item \gui{Clean to Grid}
  \item \gui{Connectivity}
  \item \gui{D3}
  \item \gui{Delaunay 2D/3D}
  \item \gui{Extract Edges}
  \item \gui{Linear Extrusion}
  \item \gui{Loop Subdivision}
  \item \gui{Reflect}
  \item \gui{Rotational Extrusion}
  \item \gui{Shrink}
  \item \gui{Smooth}
  \item \gui{Subdivide}
  \item \gui{Tessellate}
  \item \gui{Tetrahedralize}
  \item \gui{Triangle Strips}
  \item \gui{Triangulate}
  \end{itemize}
\end{multicols}
\ifthenelse{\boolean{savetrees}}{\end{minipage}}{}

Technically, the \gui{Ribbon} and \gui{Tube} filters should fall into this
list.  However, as they only work on 1D cells in poly data, the input data
is usually small and of little concern.

This similar set of filters also output unstructured grids, but they also
tend to reduce some of this data.  Be aware though that this data reduction
is often smaller than the overhead of converting to unstructured data.
Also note that the reduction is often not well balanced.  It is possible
(often likely) that a single process may not lose any cells.  Thus, these
filters should be used with caution on unstructured data and extreme
caution on structured data.

\ifthenelse{\boolean{savetrees}}{\noindent\begin{minipage}{\linewidth}}{}
\begin{multicols}{2}
  \begin{itemize}
  \item \gui{Clip}~\clip
  \item \gui{Decimate}
  \item \gui{Extract Cells by Region}
  \item \gui{Extract Selection}~\extractSelection
  \item \gui{Quadric Clustering}
  \item \gui{Threshold}~\threshold
  \end{itemize}
\end{multicols}
\ifthenelse{\boolean{savetrees}}{\end{minipage}}{}

Similar to the items in the preceding list, \gui{Extract
  Subset}~\extractSubset performs data
reduction on a structured data set, but also outputs a structured data set.
So the warning about creating new data still applies, but you do not have
to worry about converting to an unstructured grid.

This next set of filters also outputs unstructured data, but it also
performs a reduction on the dimension of the data (for example 3D to 2D),
which results in a much smaller output.  Thus, these filters are usually
safe to use with unstructured data and require only mild caution with
structured data.

\ifthenelse{\boolean{savetrees}}{\noindent\begin{minipage}{\linewidth}}{}
\begin{multicols}{2}
  \begin{itemize}
  \item \gui{Cell Centers}
  \item \gui{Contour}~\contour
  \item \gui{Extract CTH Fragments}
  \item \gui{Extract CTH Parts}
  \item \gui{Extract Surface}
  \item \gui{Feature Edges}
  \item \gui{Mask Points}
  \item \gui{Outline (curvilinear)}
  \item \gui{Slice}~\slice
  \item \gui{Stream Tracer}~\streamTracer
  \end{itemize}
\end{multicols}
\ifthenelse{\boolean{savetrees}}{\end{minipage}}{}

These filters do not change the connectivity of the data at all.  Instead,
they only add field arrays to the data.  All the existing data is shallow
copied.  These filters are usually safe to use on all data.

\ifthenelse{\boolean{savetrees}}{\noindent\begin{minipage}{\linewidth}}{}
\begin{multicols}{2}
  \begin{itemize}
  \item \gui{Block Scalars}
  \item \gui{Calculator}~\calculator
  \item \gui{Cell Data to Point Data}
  \item \gui{Curvature}
  \item \gui{Elevation}
  \item \gui{Generate Surface Normals}
  \item \gui{Gradient}
  \item \gui{Level Scalars}
  \item \gui{Median}
  \item \gui{Mesh Quality}
  \item \gui{Octree Depth Limit}
  \item \gui{Octree Depth Scalars}
  \item \gui{Point Data to Cell Data}
  \item \gui{Process Id Scalars}
  \item \gui{Python Calculator}
  \item \gui{Random Vectors}
  \item \gui{Resample with dataset}
  \item \gui{Surface Flow}
  \item \gui{Surface Vectors}
  \item \gui{Texture Map to...}
  \item \gui{Transform}
  \item \gui{Warp (scalar)}
  \item \gui{Warp (vector)}~\warp
  \end{itemize}
\end{multicols}
\ifthenelse{\boolean{savetrees}}{\end{minipage}}{}

This final set of filters are those that either add no data to the output
(all data of consequence is shallow copied) or the data they add is
generally independent of the size of the input.  These are almost always
safe to add under any circumstances (although they may take a lot of time).

\ifthenelse{\boolean{savetrees}}{\noindent\begin{minipage}{\linewidth}}{}
\begin{multicols}{2}
  \begin{itemize}
  \item \gui{Annotate Time}
  \item \gui{Append Attributes}
  \item \gui{Extract Block}
  \item \gui{Extract Datasets}
  \item \gui{Extract Level}~\extractGroup
  \item \gui{Glyph}~\glyph
  \item \gui{Group Datasets}~\group
  \item \gui{Histogram}~\histogram
  \item \gui{Integrate Variables}
  \item \gui{Normal Glyphs}
  \item \gui{Outline}
  \item \gui{Outline Corners}
  \item \gui{Plot Global Variables Over Time}
  \item \gui{Plot Over Line}~\plotOverLine
  \item \gui{Plot Selection Over Time}~\plotSelectionOverTime
  \item \gui{Probe Location}~\probe
  \item \gui{Temporal Shift Scale}
  \item \gui{Temporal Snap-to-Time-Steps}
  \item \gui{Temporal Statistics}
  \end{itemize}
\end{multicols}
\ifthenelse{\boolean{savetrees}}{\end{minipage}}{}

There are a few special case filters that do not fit well into any of the
previous classes.  Some of the filters, currently \gui{Temporal
  Interpolator} and \gui{Particle Tracer}, perform calculations based on
how data changes over time.  Thus, these filters may need to load data for
two or more instances of time, which can double or more the amount of data
needed in memory.  The \gui{Temporal Cache} filter will also hold data for
multiple instances of time.  Also keep in mind that some of the temporal
filters such as the temporal statistics and the filters that plot over time
may need to iteratively load all data from disk.  Thus, it may take an
impractically long amount of time even though it does not require any extra
memory.

The \gui{Programmable Filter}~\icon{pqProgrammableFilter24} is also a
special case that is impossible to classify.  Since this filter does
whatever it is programmed to do, it can fall into any one of these
categories.

\section{Culling Data}
\label{sec:CullingData}

When dealing with large data, it is clearly best to cull out data whenever
possible, and the earlier the better.  Most large data starts as 3D
geometry and the desired geometry is often a surface.  As surfaces usually
have a much smaller memory footprint than the volumes that they are derived
from, it is best to convert to a surface soon.  Once you do that, you can
apply other filters in relative safety.

A very common visualization operation is to extract isosurfaces from a
volume using the \gui{Contour}~\contour filter.  The \gui{Contour} filter
usually outputs geometry much smaller than its input.  Thus, the
\gui{Contour} filter should be applied early if it is to be used at all.
Be careful when setting up the parameters to the \gui{Contour} filter
because it still is possible for it to generate a lot of data.  This
obviously can happen if you specify many isosurface values.  High
frequencies such as noise around an isosurface value can also cause a
large, irregular surface to form.

Another way to peer inside of a volume is to perform a \gui{Slice}~\slice
on it.  The \gui{Slice}~\slice filter will intersect a volume with a plane
and allow you to see the data in the volume where the plane intersects.  If
you know the relative location of an interesting feature in your large data
set, slicing is a good way to view it.

If you have little \emph{a-priori} knowledge of your data and would like to
explore the data without paying the memory and processing time for the full
data set, you can use the \gui{Extract Subset}~\extractSubset filter to
subsample the data.  The subsampled data can be dramatically smaller than
the original data and should still be well load balanced.  Of course, be
aware that you may miss small features if the subsampling steps over them
and that once you find a feature you should go back and visualize it with
the full data set.

There are also several features that can pull out a subset of a volume:
\gui{Clip}~\clip, \gui{Threshold}~\threshold, \gui{Extract Selection}, and
\gui{Extract Subset}~\extractSubset can all extract cells based on some
criterion.  Be aware, however, that the extracted cells are almost never
well balanced; expect some processes to have no cells removed.  Also, all
of these filters with the exception of \gui{Extract Subset}~\extractSubset
will convert structured data types to unstructured grids.  Therefore, they
should not be used unless the extracted cells are of at least an order of
magnitude less than the source data.

When possible, replace the use of a filter that extracts 3D data with one
that will extract 2D surfaces.  For example, if you are interested in a
plane through the data, use the \gui{Slice}~\slice filter rather than the
\gui{Clip}~\clip filter.  If you are interested in knowing the location of
a region of cells containing a particular range of values, consider using
the \gui{Contour}~\contour filter to generate surfaces at the ends of the
range rather than extract all of the cells with the
\gui{Threshold}~\threshold filter.  Be aware that substituting filters can
have an effect on downstream filters.  For example, running the
\gui{Histogram}~\histogram filter after
\gui{Threshold}~\threshold will have an entirely different effect than
running it after the roughly equivalent \gui{Contour}~\contour filter.


\section{Keeping Track of Memory}

\index{memory inspector|(}

When working with very large models, it is important to keep track of
memory usage on your computer. One of the most common and frustrating
problems encountered with large models is running out of memory. This in
turn will lead to thrashing in the virtual memory system or an outright
program fault.

Sections \ref{sec:AvoidingDataExplosion} and \ref{sec:CullingData} provide
suggestions to reduce your memory usage. Even so, it is wise to keep an eye
on the memory available in your system. ParaView provides a tool called the
\keyterm{memory inspector} designed to do just that.

\begin{inlinefig}
  \includegraphics[width=\scw]{images/MemoryInspector}
\end{inlinefig}

To access the memory inspector, select in the menu bar \gui{View} \ra
\gui{Memory Inspector}. The memory inspector provides information for both
the client you are running on and any server you might be connected to. It
will tell you the total amount of memory used on the system and the amount
of memory ParaView is using. For servers containing multiple nodes,
information both for the conglomerate job and for each individual node are
given. Note that a memory issue in any single node can cause a problem for
the entire ParaView job.

\index{memory inspector|)}


\section{Rendering}

\index{rendering|(}

Rendering is the process of synthesizing the images that you see based on
your data.  The ability to effectively interact with your data depends
highly on the speed of the rendering.  Thanks to advances in 3D hardware
acceleration, fueled by the computer gaming market, we have the ability to
render 3D quickly even on moderately priced computers.  But, of course, the
speed of rendering is proportional to the amount of data being rendered.
As data gets bigger, the rendering process naturally gets slower.

\index{rendering!interactive|see{interactive render}}
\index{rendering!still|see{still render}}

To ensure that your visualization session remains interactive, ParaView
supports two modes of rendering that are automatically flipped as
necessary.  In the first mode, \keyterm{still render}, the data is rendered
at the highest level of detail.  This rendering mode ensures that all of
the data is represented accurately.  In the second mode,
\keyterm{interactive render}, speed takes precedence over accuracy.  This
rendering mode endeavors to provide a quick rendering rate regardless of
data size.

While you are interacting with a 3D view, for example rotating, panning, or
zooming with the mouse, ParaView uses an interactive render.  This is
because during the interaction a high frame rate is necessary to make these
features usable and because each frame is immediately replaced with a new
rendering while the interaction is occurring so that fine details are less
important during this mode.  At any time when interaction of the 3D view is
not taking place, ParaView uses a still render so that the full detail of
the data is available as you study it.  As you drag your mouse in a 3D view
to move the data, you may see an approximate rendering while you are moving
the mouse, but the full detail will be presented as soon as you release the
mouse button.

The interactive render is a compromise between speed and accuracy.  As
such, many of the rendering parameters concern when and how lower levels of
detail are used.

\subsection{Basic Rendering Settings}
\label{sec:BasicRenderingSettings}

Some of the most important rendering options are the LOD parameters.
During interactive rendering, the geometry may be replaced with a lower
\keyterm{level of detail} (\keyterm{LOD}), an approximate geometry with
fewer polygons.

\begin{inlinefig}
  \includegraphics[width=.28\linewidth]{images/GeometricLODFull}
  \includegraphics[width=.28\linewidth]{images/GeometricLOD50}
  \includegraphics[width=.28\linewidth]{images/GeometricLOD10}
\end{inlinefig}

The resolution of the geometric approximation can be controlled. In the
proceeding images, the left image is the full resolution; the middle image
is the default decimation for interactive rendering, and the right image is
ParaView's maximum decimation setting.

The 3D rendering parameters are located in the settings dialog box which is
accessed in the menu from \gui{Edit} \ra \gui{Settings} (\gui{ParaView} \ra
\gui{Preferences} on the Mac).  The rendering options in the dialog
are in the \gui{Render View} tab.

\begin{inlinefig}
  \includegraphics[width=0.8\scw]{images/SettingsRendering}
\end{inlinefig}

The options pertaining to the geometric decimation for interactive
rendering are located in a section labeled \gui{Interactive Rendering
  Options}. Some of these options are considered advanced, so to access
them you have to either toggle on the advanced options with the
\icon{pqAdvanced26} button or search for the option using the edit box at
the top of the dialog. The interactive rendering options include the
following.

\begin{itemize}
\item \index{LOD Threshold} Set the data size at which to use a decimated
  geometry in interactive rendering. If the geometry size is under this
  threshold, ParaView always renders the full geometry. Increase this value
  if you have a decent graphics card that can handle larger data. Try
  decreasing this value if your interactive renders are too slow.
\item \index{LOD Resolution} Set the factor that controls how large the
  decimated geometry should be. This control is set to a value between 0
  and 1. 0 produces a very small number of triangles but possibly with a
  lot of distortion. 1 produces more detailed surfaces but with larger
  geometry. \icon{pqAdvanced26}
\item \index{interactive render!delay} Add a delay between an interactive
  render and a still render. ParaView usually performs a still render
  immediately after an interactive motion is finished (for example,
  releasing the mouse button after a rotation). This option can add a delay
  that can give you time to start a second interaction before the still
  render starts, which is helpful if the still render takes a long time to
  complete. \icon{pqAdvanced26}
\item \index{interactive render!outline} Use an outline in place of
  decimated geometry. The outline is an alternative for when the geometry
  decimation takes too long or still produces too much geometry. However, it
  is more difficult to interact with just an outline.
\end{itemize}

ParaView contains many more rendering settings. Here is a summary of some
other settings that can effect the rendering performance regardless of
whether ParaView is run in client-server mode or not. These options are
spread among several categories, and several are considered advanced.

\begin{description}
\item[\gui{Geometry Mapper Options}]~
  \begin{itemize}
  \item \index{immediate mode rendering} \index{display lists} Enable or
    disable the use of display lists. Display lists are internal structures
    built by graphics systems. They can potentially speed up rendering but
    can also take up memory.
  \end{itemize}
\item[\gui{Translucent Rendering Options}]~
  \begin{itemize}
  \item \index{depth peeling} Enable or disable depth peeling. Depth
    peeling is a technique ParaView uses to properly render translucent
    surfaces. With it, the top surface is rendered and then ``peeled away''
    so that the next lower surface can be rendered and so on.  If you find
    that making surfaces transparent really slows things down or renders
    completely incorrectly, then your graphics hardware may not be
    implementing the depth peeling extensions well; try shutting off depth
    peeling. \icon{pqAdvanced26}
  \item Set the maximum number of peels to use with depth peeling. Using
    more peels allows more depth complexity but allowing less peels runs
    faster. You can try adjusting this parameter if translucent geometry
    renders too slow or translucent images do not look correct.
    \icon{pqAdvanced26}
  \end{itemize}
\item[\gui{Miscellaneous}]~
  \begin{itemize}
  \item When creating very large datasets, default to the outline
    representation. Surface representations usually require ParaView to
    extract geometry of the surface, which takes time and memory. For data
    with size above this threshold, use the outline representation, which
    has very little overhead, by default instead.
  \item \index{rendering!performance} Show or hide annotation providing
    rendering performance information. This information is handy when
    diagnosing performance problems. \icon{pqAdvanced26}
  \end{itemize}
\end{description}

Note that this is not a complete list of ParaView rendering settings. We
have left out settings that do not significantly effect rendering
performance. We have also left out settings that are only valid for
parallel client-server rendering, which are discussed in
Section~\ref{sec:ParallelRenderParameters}.

\subsection{Basic Parallel Rendering}

\index{rendering!parallel|(}

When performing parallel visualization, we are careful to ensure that the
data remains partitioned among all of the processes up to and including
the rendering processes.  ParaView uses a parallel rendering library called
\keyterm{IceT}.  IceT uses a \keyterm{sort-last} algorithm for parallel
rendering.  This parallel rendering algorithm has each process
independently render its partition of the geometry and then
\keyterm{composites} the partial images together to form the final image.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelRendering}
\end{inlinefig}

The preceding diagram is an oversimplification.  IceT contains multiple
parallel image compositing algorithms such as \keyterm{binary tree},
\keyterm{binary swap}, and \keyterm{radix-k} that efficiently divide work
among processes using multiple phases.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelRenderingDetail}
\end{inlinefig}

The wonderful thing about sort-last parallel rendering is that its
efficiency is completely insensitive to the amount of data being rendered.
This makes it a very scalable algorithm and well suited to large data.
However, the parallel rendering overhead does increase linearly with the
number of pixels in the image.  Consequently, some of the rendering
parameters deal with the image size.

\begin{inlinefig}
  \includegraphics[scale=\bbscale]{images/ParallelRenderingTiles}
\end{inlinefig}

IceT also has the ability to drive tiled displays, large, high-resolution
displays comprising an array of monitors or projectors.  Using a sort-last
algorithm on a tiled display is a bit counterintuitive because the number
of pixels to composite is so large.  However, IceT is designed to take
advantage of spatial locality in the data on each process to drastically
reduce the amount of compositing necessary.  This spatial locality can be
enforced by applying the \gui{D3} filter to your data.

Because there is an overhead associated with parallel rendering, ParaView
has the ability to turn off parallel rendering at any time.  When parallel
rendering is turned off, the geometry is shipped to the location where
display occurs.  Obviously, this should only happen when the data being
rendered is small.

\subsection{Image Level of Detail}

The overhead incurred by the parallel rendering algorithms is proportional
to the size of the images being generated.  Also, images generated on a
server must be transfered to the client, a cost that is also proportional
to the image size.  To help increase the frame rate during interaction,
ParaView introduces a new LOD parameter that controls the size of the
images.

During interaction while parallel rendering, ParaView can optionally
\index{subsample}subsample the image.  That is, ParaView will reduce the
resolution of the image in each dimension by a factor during interaction.
Reduced images will be rendered, composited, and transfered.  On the
client, the image is inflated to the size of the available space in the
GUI.

\begin{inlinefig}
  \includegraphics[width=.2\linewidth]{images/ImageLODFull}
  \includegraphics[width=.2\linewidth]{images/ImageLOD2}
  \includegraphics[width=.2\linewidth]{images/ImageLOD4}
  \includegraphics[width=.2\linewidth]{images/ImageLOD8}
\end{inlinefig}

The resolution of the reduced images is controlled by the factor with which
the dimensions are divided.  In the proceeding images, the left image has
the full resolution.  The following images were rendered with the
resolution reduced by a factor of 2, 4, and 8, respectively.

ParaView also has the ability to compress images before transferring them
from server to client. Compression, of course, reduces the amount of data
transferred and therefore makes the most of the available bandwidth.
However, the time it takes to compress and decompress the images adds to
the latency.

ParaView contains two different image compression algorithms for
client-server rendering. The first is a custom algorithm called
\keyterm{Squirt}, which stands for Sequential Unified Image Run Transfer.
Squirt is a run-length encoding compression that reduces color depth to
increase run lengths. The second algorithm uses the \keyterm{Zlib}
compression library, which implements a variation of the Lempel-Ziv
algorithm. Zlib typically provides better compression than Squirt, but
takes longer to perform and hence adds to the latency.

\subsection{Parallel Render Parameters}
\label{sec:ParallelRenderParameters}

\begin{inlinefig}
  \includegraphics[width=0.8\scw]{images/SettingsServer}
\end{inlinefig}

Like the other 3D rendering parameters, the parallel rendering parameters
are located in the settings dialog box, which is accessed in the menu from
\gui{Edit} \ra \gui{Settings} (\gui{ParaView} \ra \gui{Preferences} on the
Mac).  The parallel rendering options in the dialog are in the \gui{Render
  View} tab (intermixed with several other rendering options such as those
described in Section~\ref{sec:BasicRenderingSettings}). The parallel and
client-server options are divided among several categories, and several are
considered advanced.

\begin{description}
\item[\gui{Remote/Parallel Rendering Options}]~
  \begin{itemize}
  \item \index{remote render threshold} Set the data size at which to
    render remotely in parallel or to render locally. If the geometry is
    over this threshold (and ParaView is connected to a remote server), the
    data is rendered in parallel remotely and images are sent back to the
    client. If the geometry is under this threshold, the geometry is sent
    back to the client and images are rendered locally on the client.
  \item Set the sub-sampling factor for still (non-interactive) rendering.
    Some large displays have more resolution than is really necessary, so
    this sub-sampling reduces the resolution of all images displayed.
    \icon{pqAdvanced26}
  \end{itemize}
\item[\gui{Client/Server Rendering Options}]~
  \begin{itemize}
  \item \index{interactive render!subsample} \index{subsample} Set the
    interactive subsampling factor. The overhead of parallel rendering is
    proportional to the size of the images generated.  Thus, you can speed
    up interactive rendering by specifying an image subsampling rate.  When
    this box is checked, interactive renders will create smaller images,
    which are then magnified when displayed.  This parameter is only used
    during interactive renders. \icon{pqAdvanced26}
  \end{itemize}
\item[\gui{Image Compression}]~
  \begin{itemize}
  \item Before images are shipped from server to client, they optionally
    can be compressed using one of two compression algorithms:
    Squirt\index{Squirt} or Zlib\index{Zlib}. To make the compression more
    effective, either algorithm can reduce the color resolution of the
    image before compression.  The sliders determine the amount of color
    bits saved. Full color resolution is always used during a still
    render. \icon{pqAdvanced26}
  \item Suggested image compression presets are provided for several common
    network types. When attempting to select the best image compression
    options, try starting with the presets that best match your connection.
    \icon{pqAdvanced26}
  \end{itemize}
\end{description}

\index{rendering!parallel|)}

\subsection{Parameters for Large Data}

The default rendering parameters are suitable for most users.  However,
when dealing with very large data, it can help to tweak the rendering
parameters.  The optimal parameters depend on your data and the hardware
ParaView is running on, but here are several pieces of advice that you
should follow.

\begin{itemize}
\item Try turning off display lists. Turning this option off will
  prevent the graphics system from building special rendering structures.
  If you have graphics hardware, these rendering structures are important
  for feeding the GPUs fast enough.  However, if you do not have GPUs,
  these rendering structures do not help much.
\item If there is a long pause before the first interactive render of a
  particular data set, it might be the creation of the decimated
  geometry. Try using an outline instead of decimated geometry for
  interaction. You could also try lowering the factor of the decimation to
  0 to create smaller geometry.
\item Avoid shipping large geometry back to the client. The remote
  rendering will use the power of entire server to render and ship images
  to the client.  If remote rendering is off, geometry is shipped back to
  the client.  When you have large data, it is always faster to ship images
  than to ship data (although if your network has a high latency, this
  could become problematic for interactive frame rates).
\item Adjust the interactive image sub-sampling for client-server rendering
  as needed.  If image compositing is slow, if the connection between
  client and server has low bandwidth, or if you are rendering very large
  images, then a higher subsample rate can greatly improve your interactive
  rendering performance.
\item Make sure \gui{Image Compression} is on.  It has a tremendous effect
  on desktop delivery performance, and the artifacts it introduces, which
  are only there during interactive rendering, are minimal.  Lower
  bandwidth connections can try using Zlib instead of Squirt compression.
  Zlib will create smaller images at the cost of longer
  compression/decompression times.
\item If the network connection has a high latency, adjust the parameters
  to avoid remote rendering during interaction. In this case, you can try
  turning up the remote rendering threshold a bit, and this is a place
  where using the outline for interactive rendering is effective.
\item If the still (non-interactive) render is slow, try turning on the
  delay between interactive and still rendering to avoid unnecessary
  renders.
\end{itemize}

\index{rendering|)}


% Chapter Visualizing Large Models