-
Notifications
You must be signed in to change notification settings - Fork 0
/
body.tex
executable file
·336 lines (297 loc) · 17.7 KB
/
body.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
\section{Introduction}
\subsection{Motivation and Background}
Since the publication of the Fortran 2008 standard in 2010~\cite{iso2010information}, Fortran supports a \gls{spmd}
programming style that facilitates the creation of a fixed number of replicas of a compiled program, wherein each replica
executes asynchronously after creation. Fortran refers to each replica as an image. The primary mechanism for distributing and
communicating data between images involves defining \glspl{coarray}, entities that may be referenced or defined on one image by
statements executing on other images. As such, a coarray defines a \gls{pgas} in which one image referencing or defining a
coarray on another image causes inter-image communication.
The seminal role that \glspl{coarray} played in the development of Fortran's intrinsic parallel programming model have made it
common to refer to all of modern Fortran's parallel programming features under the rubric of \gls{caf}. To date, most
published \gls{caf} applications involve scenarios wherein the parallelization itself poses one of the chief
challenges and necessitates the custom development of parallel algorithms. These include ordinary and partial differential
equation solvers in domains ranging from nuclear fusion~\cite{preissl2011multithreaded} and
weather~\cite{mozdzynski2015partitioned} to multidimensional fast Fourier transforms and multigrid numerical
methods~\cite{garain2015comparing}. Much of the effort involved in
expressing parallel algorithms for these domains centers on
designing and using various \gls{coarray} data structures. In such settings, the moniker
\gls{caf} seems appropriate.
Less widely appreciated are the ways Fortran's intrinsic parallel programming model supports embarrassingly
parallel applications, wherein the division into independent sub-problems requires little coordination between
the sub-problems. To support such applications, a parallel programming
model might provide for explicit sub-problem disaggregation and independent sub-problem execution without any need for
\gls{pgas} data structures such as \glspl{coarray}. The draft Fortran 2018 standard
(previously named ``Fortran 2015''\footnote{A Committee Draft is at https://bit.ly/fortran-2015-draft.}) offers several
features that enable a considerable amount of parallel computation, coordination, and communication
even without \glspl{coarray}. A working definition of ``embarrassingly parallel'' Fortran might denote the class of
use cases for which parallel algorithmic needs are met by the non-\gls{coarray} parallel features, including
\begin{itemize}
\item Forming teams of images that communicate only with each other by default,
\item Image synchronization: a mechanism for ordering the execution of program segments in differing images,
\item Collective subroutines: highly optimized implementations of common parallel programming patterns,
\item Image enumeration: functions for identifying and counting team members,
\item Global error termination.
\end{itemize}
We anticipate that a common use case will encompass an ensemble of simulations, each member of
which executes as a parallel computation in a separate team. This paper presents such a use case for
\gls{wrf-hydro}, a terrestrial hydrological model developed at \gls{ncar}.
\subsection{Objectives}\label{sec:objectives}
The objectives of the current work are threefold:
\begin{enumerate}
\item To contribute the first-ever compiler front-end support for Fortran 2018 teams.
\item To contribute the parallel runtime library functionality required to support the new compiler capabilities.
\item To study and address issues arising form integrating teams into an existing \gls{mpi} application, \gls{wrf-hydro},
\end{enumerate}
The compiler front-end described herein lives on the
\fnurl{Sourcery Institute \gls{gcc} fork's}{https://github.com/sourceryinstitute/gcc} teams branch. The parallel runtime library lives
on the \fnurl{OpenCoarrays}{https://github.com/sourceryinstitute/opencoarrays} opencoarrays-teams
\gls{abi}. We are unaware of any compiler support for Fortran 2018 teams other than that developed for the current
project.
%In the remainder of this paper, Section~\ref{sec:methodology} describes the teams
%support in the \gls{gcc} fork and in OpenCoarrays; Section~\ref{sec:discussion} discusses the use of teams
%in \gls{wrf-hydro} and a required language extension; Section~\ref{sec:conclusions} summarizes our conclusions
%and future work plans.
\section{Methodology}\label{sec:methodology}
Fortran 2018 will facilitate forming nonoverlapping image
groups, allowing more efficient and independent subproblem execution.
Multiphysics applications, e.g., climate and weather models, may benefit from the
consequent reduction in off-node communication, particularly when an entire team
fits in a single compute node.
\subsection{Teams in Fortran 2018}\label{teams-in-fortran-2015}
A team is a set of images that can execute independently of other images outside the team.
At program launch, all images comprise a team designated by a language-defined \texttt{initial\_team} integer identifier.
Except for the initial team, every team has a parent team in a one-to-many parent/child hierarchy.
A program executes a \texttt{form team} statement to create a mapping for subsequent grouping of images into
new child teams. Each new team has an integer identifier that Fortran produces as the result of invoking the
\texttt{team\_number()} intrinsic function. Information about the team to which the current image belongs can be
determined by the compiler from the collective value of the team variables on the images of the team.
All images execute the \texttt{form team} statement as a collective operation.
An image changes teams by executing \texttt{change team} or
\texttt{end team}. The former moves the executing image form the current team to a team
specified by a derived-type~\texttt{team\_type} variable. Subsequent execution of a corresponding \texttt{end team}
statement restores the current team back to that team to which it belonged immediately prior to execution of the most
recent \texttt{change team}.
The \texttt{form team} statement takes an \texttt{team\_number} argument uniquely identifying the team and
a \texttt{team\_type} argument encapsulating other team information in private components.
Successful execution of a \texttt{form team} statement assigns the team-variable (of type \texttt{team type}) on each
participating image a value that specifies the new team to which the image will belong.
The \texttt{change team} statement takes as argument a \texttt{team type} variable that represents the new team to be used as
current team. The execution of the \texttt{end team} statement restores the current team back
to that immediately prior to execution of the \texttt{change team} statement.
Figures~\ref{fig:team-number-test}-\ref{fig:get-communicator-test} demonstrate the use of teams in two
unit tests from the OpenCoarrays repository.
\begin{figure}
\lstinputlisting{figures/tests/team-number.f90}
\vspace{-12pt}
\caption{A unit test for the team-number function.\label{fig:team-number-test}}
\end{figure}
\begin{figure*}
\lstinputlisting{figures/tests/get-communicator.f90}
\vspace{-12pt}
\caption{A unit test for the get\_communicator function.\label{fig:get-communicator-test}}
\end{figure*}
\subsection{Teams in \gls{gcc} and OpenCoarrays}\label{subsec:teams-in-gcc}
A Fortran team is comparable to an MPI communicator. OpenCoarrays uses
\texttt{MPI\_Comm\_split} to support
\texttt{form team}, passing
the team id as the \texttt{color},
and storing the resulting communicator in an available-teams list.
Every list element tracks the team/communicator pairings.
The function returns the available-teams list stored in the \texttt{team type} variable.
We initialize the available- and used-teams lists to 1 (equivalent to \texttt{MPI\_COMM\_WORLD}) at the beginning of the execution.
At a \texttt{change team}, the available-teams list element stored in the \texttt{team\_type} variable
gets passed to the corresponding OpenCoarrays function. The \texttt{current\_team} variable used inside OpenCoarrays for
representing the current communicator gets reassigned with the value contained into the element of the list passed as argument.
Finally, a new element is added to the list of used teams. The list elements are pointers to the elements of the available-teams list.
The insertion operation is always performed at the beginning of the list in order to keep track of the teams hierarchy.
An execution of the \texttt{end team} statements is implemented by removing the first element of the list of used teams and reassigning the
\texttt{current team} to the new first element of the list of used teams.
Figure~\ref{fig:teams} depicts schematically an initial team of images (black arrows) executing over time (progressing
downward) and able to coordinate and communicate through a global mechanism (black horizontal line). At the point of executing
\texttt{form team} and \texttt{change team} statements, the compiler inserts references to the OpenCoarrays \gls{abi} into the
executable program. Those references cause invocations of \texttt{MPI\_Split}, which in turn creates the colored groupings that
correspond to teams in Fortran 2018.
\begin{figure}
\includegraphics[width=0.4\textwidth]{figures/teams}
\vspace{-24pt}
\caption{Schematic of program execution over time
(left axis) in 12 images (top) communicating globally and then within subgroups. Horizontal lines show communication mechanisms (default=solid, optional=dashed). Fortran concepts (left). Underlying \gls{mpi} concepts (right).\label{fig:teams}}
\end{figure}
%
\begin{figure}
\includegraphics[width=0.46\textwidth]{figures/WRF-Hydro-caf-ens-model_chain.png}
\vspace{-10pt}
\caption{WRF-Hydro caffeination via Fortran 2018
teams: example components of the National Water
Model. Different \gls{mpi} colors represent independent teams,
each of which is an ensemble member.
\label{fig:caffeinate-wrf-hydro}}
\end{figure}
%
The teams unit tests in
Figures~\ref{fig:team-number-test}--\ref{fig:get-communicator-test} use a block distribution of images,
dividing the initial team into three new teams, each with the same number of images except some teams
with one extra. The number of teams with an extra image equals the remainder of integer division of the total
number images by the number of teams. In Figure~\ref{fig:get-communicator-test}, an assertion procedure
terminates across all images if \texttt{assertion} is false. The optional second argument in \texttt{assert}
describes the checks performed.
\subsection{A language extension}
Line 2 in Figure~\ref{fig:get-communicator-test} imports a \texttt{get\_communicator()} function via Fortran's
use-association mechanism for accesing entities in Fortran modules: an \texttt{opencoarrays} module that
provides language extensions. Lines 8 and 12 invoke this function
to provide the \gls{mpi} communicator to the test subroutine \texttt{mpi\_mpatches\_caf}. Assertions in the
latter subroutine verify the expected correspondences between \gls{mpi} image and rank numbering.
The \gls{mpi}/\gls{caf} correspondence enables the newly caffeinated \gls{wrf-hydro}
to interoperate safely with the existing \gls{wrf-hydro} \gls{mpi} code.
\section{Discussion of Results}\label{sec:discussion}
\gls{wrf-hydro} is a community hydrologic model providing a parallel-computing
framework for coupling weather prediction (Figure \ref{fig:caffeinate-wrf-hydro}a), land surface
(Figure \ref{fig:caffeinate-wrf-hydro}b), and hydrologic routing to handle spatial water redistribution
via overland flow (Figure \ref{fig:caffeinate-wrf-hydro}c), subsurface (soil column) flow (Figure \ref{fig:caffeinate-wrf-hydro}c),
baseflow (deep groundwater, \ref{fig:caffeinate-wrf-hydro}d), and stream channel transport
(Figure \ref{fig:caffeinate-wrf-hydro}e)~\cite{gochisEtal2014}.
Developed to couple land hydrology to the atmosphere,
WRF-Hydro usually runs ``offline'' with
forcing from upper-boundary (weather) conditions (Fig.~\ref{fig:caffeinate-wrf-hydro}).
The chief example is \gls{nwm} of \gls{noaa}~\cite{noaa2016}, a special
configuration of WRF-Hydro providing operational, real-time analysis and forecasts
over the U.S.
Ensemble forecasting and ensemble data assimilation are
growing research areas for WRF-Hydro. Running ensembles under one
executable program and job submission as shown in
Fig. \ref{fig:caffeinate-wrf-hydro} can reduce the
labor in workflow design and can open up
possibilities for improving data flows and optimizing ensemble
execution time.
\begin{figure}
\includegraphics[width=0.36\textwidth]{figures/init_timing_linear.png}
\vspace{-7pt}
\caption{Scaling of the channel-only initialization time to
support Amdhal's law calculation of potential speedup of ensemble
computation with shared initialization. \label{fig:wrf-hydro-init-scaling}}
\end{figure}
An optimization that we expect teams to facilitate will involve sharing initialization
work over all ensemble members using
the full image set before changing teams. Amortizing common
initialization work across all available computational resources
rather than repeating common initialization work in each
ensemble member using only that member's fraction of the resources might
save significant computational costs in runs for which
initialization occupies a large fraction of
the run time such as in production runs of short-term \gls{nwm}
forecasts. We are exploring
moving from deterministic
(single ensemble member) runs to ensemble runs for the stream
channel submodel in isolation. Fig.~\ref{fig:wrf-hydro-init-scaling}
presents the scaling behavior of channel-only runs.
Running 50 ensemble members concurrently in
teams occupying 20 cores each, the initialization speedup will be
approximately 20x. Currently, for the short-range forecasts,
initialization requires approximately 50\% of the run time on
NCAR's yellowstone computer and greater than that on \gls{noaa} WCOSS
system~\cite{yuetal2017}. These numbers are
similar for the channel-only model. In our proposed application for an ensemble size of 50, we estimate that
approximately half the runtime ($f=.5$) can be sped up by a factor of
about 20 ($S_f=20$), based on Fig.~\ref{fig:wrf-hydro-init-scaling}. From
these estimates, Amdhal\'s law yields a total speedup
of 1.9:
\begin{equation}
S = 1 / { f/S_f + (1-f) } = 1 / (.5/20 + (1-.5)) = 1.9
\end{equation}
We converted the \gls{wrf-hydro} main program to a subroutine
called in a loop over the ensemble members.
We deleted \texttt{MPI\_Init} and \texttt{MPI\_FInalize} calls in the converted
subroutine and instead pass a communicator to the subroutine. To
loop over the ensemble also required deleting the \texttt{stop} statement in the
subroutine.
\section{Conclusions and Future Work}\label{sec:conclusions}
We applied Fortran 2018 teams to \gls{wrf-hydro} ensemble simulation and forecasting.
We developed the first-ever compiler front-end and parallel runtime library support
for teams and a language extension that exposes \gls{caf}'s
underlying \gls{mpi} communicator for use in \gls{wrf-hydro}. This approach facilitates
incremental introduction of \gls{caf}, i.e.,``caffeination,'' of the
pre-existing \gls{mpi} program. We predict a speedup of 1.9 in ensemble execution when
spreading common initializations across all images prior to changing teams. Such a speedup
would grealy impact \gls{wrf-hydro} operational use cases.
%%%
%%% Sample tables
%%%
%\begin{table}
% \caption{Frequency of Special Characters}
% \label{tab:freq}
% \begin{tabular}{ccl}
% \toprule
% Non-English or Math&Frequency&Comments\\
% \midrule
% \O & 1 in 1,000& For Swedish names\\
% $\pi$ & 1 in 5& Common in math\\
% \$ & 4 in 5 & Used in business\\
% $\Psi^2_1$ & 1 in 40,000& Unexplained usage\\
% \bottomrule
%\end{tabular}
%\end{table}
%\begin{table*}
% \caption{Some Typical Commands}
% \label{tab:commands}
% \begin{tabular}{ccl}
% \toprule
% Command &A Number & Comments\\
% \midrule
% \texttt{{\char'134}author} & 100& Author \\
% \texttt{{\char'134}table}& 300 & For tables\\
% \texttt{{\char'134}table*}& 400& For wider tables\\
% \bottomrule
% \end{tabular}
%\end{table*}
% end the environment with {table*}, NOTE not {table}!
%It is strongly recommended to use the package booktabs~\cite{Fear05}
%and follow its main principles of typography with respect to tables:
%\begin{enumerate}
%\item Never, ever use vertical rules.
%\item Never use double rules.
%\end{enumerate}
%It is also a good idea not to overuse horizontal rules.
%%%
%%% Sample figures
%%%
%\begin{figure}
%\includegraphics{fly}
%\caption{A sample black and white graphic.}
%\end{figure}
%\begin{figure}
%\includegraphics[height=1in, width=1in]{fly}
%\caption{A sample black and white graphic
%that has been resized with the \texttt{includegraphics} command.}
%\end{figure}
%\begin{figure*}
%\includegraphics{flies}
%\caption{A sample black and white graphic
%that needs to span two columns of text.}
%\end{figure*}
%
%\begin{figure}
%\includegraphics[height=1in, width=1in]{rosette}
%\caption{A sample black and white graphic that has
%been resized with the \texttt{includegraphics} command.}
%\end{figure}
%\end{document} % This is where a 'short' article might terminate
%\appendix
%Appendix A
%\section{Teams unit tests}
% This next section command marks the start of
% Appendix B, and does not continue the present hierarchy
%\begin{figure*}
%\includegraphics[width=0.7\textwidth]{figures/hydro-map}
%\vspace{-36pt}
%\caption{Sample \gls{wrf-hydro} simulation domain.}
%\end{figure*}
%
\begin{acks}
The first author thanks the
\gls{ncar} Visitor Program for supporting a visit to \gls{ncar} to
perform much of the work described herein.
% The work is
% supported by the \grantsponsor{GSxxxxx}{National
% Science Foundation China}{http://dx.doi.org/zz.yyyyy/xxxxx} under Grant
% No.:~\grantnum{GSxxxxx}{yyyyyyy}
\end{acks}