author = {Jin-Ping Gwo and Forrest M. Hoffman and William W. Hargrove},
  title = {Mechanistic-Based Genetic Algorithm Search on a {B}eowulf Cluster of {L}inux {PC}s},
  booktitle = {Proceedings of the High Performance Computing 2000 ({HPC}2000) Conference},
  dates = {16--20 April 2000},
  location = {Washington, DC},
  day = 16,
  month = apr,
  year = 2000,
  abstract = {A simple genetic algorithm (SGA) was implemented on a cluster of Linux PCs to search for the most likely fracture networks in a soil column. The objective is to evaluate the performance of SGAs in a distributed computing environment that is widely and inexpensively available to environmental researchers and engineers. The Beowulf computer was built out of surplus personal computers at Oak Ridge National Laboratory by scientists in the Environmental Sciences Division (  The communication on the Beowulf is via ordinary Ethernet connection private among the processors, with a peak bandwidth of 10 Mbit/s. The CPUs are mostly Intel 486DX-2/66 and Pentiums, with 16--32 MB of memory. Most of the software on the Beowulf is from the public domain.  Using the PVM message passing library and a manager-worker paradigm, we seek to maximize the loads on CPUs of dissimilar speed and memory size. SGA is an inductive search algorithm that bases upon a few simple operators such as reproduction, crossover, and mutation. The underlying mechanisms of flow and transport phenomena in structured soils with discrete fractures are simulated by the computer code FRACTRAN. In a generation of SGA, hundreds of FRACTRAN simulations are required, which consume the majority of the CPU time needed by the SGA search process. For an entire SGA search, tens of millions of such simulations, often referred to as function evaluation in genetic algorithms literature, are performed. The minimal communication between the manager and workers, passing fracture networks represented in bit strings to the workers and bit string fitness back to the manager, suggests that small communication bandwidth is adequate to achieve high performance. The manager-worker paradigm is also highly effective in achieving load balance on heterogeneous, networked computers such as the Beowulf. In addition to reporting the performance of the implementation, we also explore the aspect of SGA related to information constraints. SGA may be trapped in local optima and genetic drifting may ensue. With additional information the SGA may be steered away from local optima and the uncertainty of the identified fracture networks may be reduced. Because multiple runs of the SGA search algorithm are necessary to determine the least uncertain fracture networks, a distributed computing environment proves to be highly effective.}
  author = {William W. Hargrove and Forrest M. Hoffman},
  title = {An Analytical Assessment Tool for Predicting Changes in a Species Distribution Map Following Changes in Environmental Conditions},
  booktitle = {Proceedings of the Fourth International Conference on Integrating GIS and Environmental Modeling ({GIS}/{EM}4): Problems, Prospects and Research Needs},
  editor = {B. O. Parks and K. M. Clarke and M. P. Crane},
  publisher = {University of Colorado, Cooperative Institute for Research in Environmental Sciences (CIRES)},
  address = {Boulder, Colorado},
  dates = {2--8 September 2000},
  location = {The Banff Centre, Banff, (AB) Canada},
  isbn = {0-9743307-0-1},
  url = {},
  day = 2,
  month = sep,
  year = 2000,
  abstract = {We have developed a GIS-based statistical technique which empirically predicts changes in the spatial distribution of habitat for a plant or animal species over a geographic area that has undergone a scenario of change in specified environmental conditions. The technique is illustrated with \textit{Pinus taeda} L., loblolly pine, and \textit{Acer saccharum} Marsh., sugar maple, under two future climate change scenarios for the continental U.S.

We use a new Multivariate Spatio-Temporal Clustering (MSTC) approach that we developed for application on raster data within a GIS. MSTC employs non-hierarchical clustering on the individual pixels in a digital map from a GIS for the purpose of classifying the cells into types or categories. Our technique uses the standardized values of each environmental condition (e.g., temperature, rainfall, soil) for every raster cell in the map as a set of coordinates that together specify a position for that raster cell in a data space having a dimension for each of the included environmental characteristics. Two raster cells from anywhere in the map that have similar combinations of environmental characteristics will be located near each other in this data space. Their proximity and relative positions in data space will quantitatively reflect their environmental similarities, allowing these cells to be classified into environmentally similar groups. MSTC combines aspects of traditional GIS and statistical clustering techniques.

Using the classification abilities of MSTC, we compared and grouped map cells by selected environmental conditions found within the present continental U.S. with conditions predicted to occur here according to two alternative future climate scenarios. Environments were specified in terms of 25 condition characteristics. We obtained high-resolution simulation forecasts for conditions within the continental U.S. in the year 2099 according to two global climate simulation models that are recognized by the U.S. National Assessment: the Canadian Climate Centre model, and the Hadley UKMO model. The VEMAP program has made yearly data sets for these models available for the period between 1994 and 2099 at 0.5 degree resolution for the continental United States.

From these models, we obtained the simulated forecasts for monthly minimum and maximum temperature, monthly solar irradiance, and monthly precipitation. We calculated differences between present and future conditions predicted in the year 2099 by each of the two models. Difference layers were applied to our high resolution maps of current conditions within the United States in order to obtain predicted conditions. Thus, sixteen of the 25 environmental conditions were altered to represent the conditions forecast to occur within the continental U.S. in the year 2099 by each model.

A variant of the MSTC procedure can be used to generate a model of the environmental envelope or niche of a particular species. We use the current distribution of \textit{Pinus} and \textit{Acer} to define their respective niches, then alter the map, and use the niche definition to classify map pixels and delineate the new distribution of habitat suitable for these tree species within the map. This technique is significant for assessing predicted effects of changes in environmental conditions (i.e., global warming) on the potential distribution of suitable habitat for both plants and animals.

Making a special prediction of the current geographic range of a species allows us to test the robustness and adequacy of the niche model. Conditions present within the current U.S. are used to predict a current geographic range for the species, which can be compared with the known actual geographic distribution. When the fitness prediction obtained for each species when the current conditions within the United States are tested against the hypervolume definitions, the predicted distributions strongly resemble the known current distributions for both of these tree species. The niche model-based geographic predictions are somewhat more extensive in terms of the outer, low-fitness peripheral areas, but still strongly resemble the original geographic ranges which were used as input to the model development process.

Predictions for a simple uniform climate warming scenario are surprising. Habitat distributions for the tested tree species generally dissapate and evaporate, without visible northward migration. We speculate that environmental conditions in many cells of the new maps may represent new combinations never seen in the present U.S. The performance of species cannot be predicted inside such cells, since the cells have left the inference space of the training data set. Methods for identifying such unpredictable cells, and speculation about their abundance and distribution are discussed.}