The GIS Weasel
The GIS Weasel and Data
Although the only required data input to the GIS Weasel is an ArcInfo GRID of elevation, there several standard ancillary data sets that are usually associated with the GIS Weasel. There are standard assumptions about the naming conventions, meaning and representation, and organization of these data. All of these will be described in sufficient detail that the user should be able to assemble a customized set of ancillary data.
Included in this text is a list of Internet sources for all of these standard data. Much of it comes from non-USGS sources. If you are uncertain as to the character of these data sets please take the time to validate them for your own work prior to using them. The USGS makes no guarantees of these data.
We are attempting to simplify the integration of alternative data layers into the standard GIS Weasel ancillary data structure by making the GIS Weasel ancillary data structure more flexible.
As always, the user is encouraged to contact the authors with questions or suggestions.
Table of Contents:
Elevation Data:
The user should choose the appropriate scale of data for their application. A rule of thumb to selecting an appropriate scale is that an area-of-interest (AOI) should have, as an absolute minimum, 10,000 cells. 50,000-100,000 cells is usually an adequate number of cells, unless you intend to do explicit routing based on drainage patterns extracted from the elevation GRID. To estimate the number of cells that will comprise the AOI, divide the expected area by the square of the cell size. The GIS Weasel has no requirements or limitations associated with scale, cell size or the geographic extent of the elevation GRID. It should be noted that the elevation GRID that is provided as input to the GIS Weasel must have a complete coordinate system definition. This includes the specification of the vertical (z) units, lateral (xy) units, datum and/or spheroid, as well as projection name.
Elevation data is now freely available from the USGS National Map – Seamless Data Distribution System (http:/seamless.usgs.gov) , which specializes in a variety of raster data themes. This system is vastly more convenient than the older system of distribution via quadrangle tiles. Note that there are two viewers available. The viewer for the United States distributes the National Elevation Dataset, at least at 30 meter cell size. In some places, higher resolution data is available. The viewer for areas outside of the United States distributes Shuttle Radar Topography Mission (SRTM) data, that has 90 meter cell size elevation data.
If these map based interface is not a convenient option, then older versions of the elevation data are still available. Note that the older elevation data are referred to as Digital Elevation Model (DEMs), and are actually a different data product then the NED. If DEMs are desired, the user should decide whether 1:250,000 or 1:24,000 scale elevation models are required. These elevation models resolutions equate to cell sizes of approximately 100 meters and 30 meters, respectively. Note that the 1:250,000 elevation models are distributed in the USGS DEM format and the 1:24,000 elevation models are distributed in the SDTS-Raster Profile format. The distribution number of DEM files are distributed has been outsourced by the USGS to a variety of commercial vendors. These vendors offer that data at no charge, as well as via for-fee means. See the US GeoData page (http://edc.usgs.gov/doc/edchome/ndcdb/ndcdb.html) for more information. Note that this data is obsolete and is not recommended.
For areas outside of the United States, the USGS also distributes the GTOPO30 elevation data set through its Distributed Active Archive Center (DAAC), via its GTOPO pages. This data is provided at the scale of 30 arc-seconds, which is roughly equivalent to a 1kilometer cell size. The resolution of this elevation data is suitable for regional-scale studies, as is not typically sufficient for things like watershed modeling.
Note that the elevation data downloaded from the web might provide more than a desired level of resolution (i.e. too many rows and columns), consuming CPU cycles and disk space. The user can resample the original elevation data to a coarser resolution, if needed. In addition, the eventual ArcInfo GRID created from the downloaded elevation data may express that information as either floating point or integer type numbers. Integer type number grids tend to occupy significantly less space on hard disk and in memory, and are faster to process. The user should evaluate whether integerizing their data will result in a significant lose of information prior to carrying out this transformation.
The data_bin:
Purpose
Some of the GIS Weasel's parameterization routines expect pre-defined data sets to be organized into a set of ArcInfo workspaces called the “data_bin”. All parameters that require the input of non-topographic information (i.e. elevation or derivatives such as slope, aspect, etc) to be derived will attempt to draw information from the data_bin. In general, parameters that require vegetation type, vegetation density, land use/land cover, or soils are based on ArcInfo grids that are stored in the data_bin. For a description of each parameterization methodology, the user is referred to the GIS Weasel online documentation. This can be located beneath the GIS Weasel installation area (“…/weasel”), at …/weasel/src_doc/params/index.html. The user should note that the information in the GIS Weasel online documentation is still being developed and should be considered a draft.
Contents
The data_bin currently holds three sub-workspaces: forests, lcov_comp, and soils. The image below diagrams the data_bin workspaces, the standard names and item definitions for GRIDs and Info tables, and the standard code conventions for these found therein.
The forests subdirectory contains two grids, lower48 and density. lower48 contains a map of vegetation speciation and density contains vegetation density information. Links to the original metadata for both of these layers are listed below in the section titled Sources of Nationally Available Data for the data_bin.
The lcov_comp directory contains one grid, called lulc. lulc is a composite of the …/forests/lower48 grid and the Global Land Cover Characterization (GLCC) data for the United States. For locations where …/forests/lower48 showed “non-forest”, the attribute value from the GLCC grid was used. Although the GLCC data does have vegetation classes, the US Forest Service produced …/forests/lower48 was assumed to have greater accuracy and was therefore used as the primary descriptor for vegetation type when deriving the lulc grid.
The soils directory contains one sub-workspace called statsgo. This refers to the US Department of Agriculture produced State Soil Geographic Database (STATSGO). Inside the statsgo workspace is one ArcInfo grid called “muid”, whose name refers to Mapping Unit ID. The original STATSGO data is not used by the GIS Weasel. A USGS derived version is used. For further details on the structure of the STATSGO data that the GIS Weasel uses, please refer to the information below in the section titled Sources of Nationally Available Data for the data_bin. Note that within the statsgo workspace, numerous info tables also exist.
Note
The data_bin is not just a directory, but also an ArcInfo workspace. The data_bin should not be created with an operating system command like “mkdir”. It should be created with the ArcInfo command “CREATEWORKSPACE”. Please see the ArcInfo online help for more information concerning the use of CREATEWORKSPACE. The user is referred to Contents>>Arc/Info Concepts>>The Arc/Info Workspace, therein.
Requesting a Prepared data_bin:
We are able to provide, on a limited basis, data_bins for portions of the conterminous (i.e. lower 48) United States. If the user would like a custom data_bin, then they should send email to rviger@usgs.gov and specify:
- the ftp URL of a compressed ArcInfo EXPORT file of a coverage enclosing the AOI
- the projection, including datum and spheroid, of the posted coverage
- a telephone number
- a corresponding email address
Please refer to the ArcInfo online help for information on the use of the EXPORT command (Index>>EXPORT (ARC Command)).
Often it is most convenient to generate a box that encloses your AOI. If the user is uncertain where the boundaries of the AOI are, they can run the GIS Weasel through the point at which an AOI boundary is delineated (the menu says “AOI Delineation” in the title bar), QUIT the GIS Weasel, and then use the coverage called “basin_v”. This coverage will be found in the user specified GIS Weasel Write Directory.
If the AOI is beyond the conterminous United States, then the user is encouraged to read the following section and then to contact us for further information.
Creating A Custom data_bin:
If we are unable to respond to your request for a prepared data_bin, then user will obviously have to create their own data_bin. If the AOI is within the United States and the standard data_bin products are acceptable, then the user can download the data and populate the ArcInfo workspaces of the data_bin with ArcInfo GRID versions of the downloaded data. The user is encouraged to examine the metadata for each of the data_bin GRIDs, found via links below, prior to using these layers.
Again, note that the data_bin structure is composed of ArcInfo workspaces, which should not be confused with operating system created directories. For more information on ArcInfo workspaces, the user is referred to the ArcInfo online help (Contents>>Arc/Info Concepts>>The Arc/Info Workspace).
Using Non-Standard GRIDs in the data_bin
If the AOI is outside of the United States or the user wishes to use alternate data, then the exercise of creating a custom data_bin becomes more complicated. If the standard parameterization routines are to be used, then the user should mimic the naming and pathname conventions of the standard data_bin using their data.
The user is responsible for ensuring that the content of a replacement GRID has the same meaning as the original. For instance, replacing the GRID …/data_bin/soils/statsgo/muid, with a grid of soil available water holding capacity is NOT appropriate. The standard parameterization routines pertaining to soil characteristics will fail as a result of non-standard data structure. The standard data_bin soils data is a particularly complicated layer that relies on several associated INFO tables via relates. Most custom data_bin structures simply omit the soils data. This results in the user deriving soil-based parameters outside of the GIS Weasel, which may be simpler than trying to mimic the STATSGO-based data structure that the GRID muid uses.
If the user understands the data_bin pathname and naming conventions and the meaning of each GRID, then a discussion of the content of the GRIDs can begin. There are two classes of GRIDs: ones with Value Attribute Tables (VATs), and ones without VATs. An easy way to guess which class a GRID falls in is to determine whether the GRID contains categorical information or not. Categorical GRIDs (e.g. GRID of numerical codes representing “what kind of trees exist here”) typically have a VAT. Those GRIDs that do not have VATs typically have a huge range of values (e.g. a GRID of elevation or vegetation density). This is not a hard-and-fast rule, just a guide. For further information on GRIDs and VATs, the user is referred to the ArcInfo online help (Index>>VAT>>GRID data model; Index>>VAT>>GRID data storage). Although GRIDs may contain character-type data, the GIS Weasel will never access this information. GRIDs must contain some numeric information.
The GRID of vegetation density (…/data_bin/forests/density) is probably the easiest one to replace. The user must simply insure that the magnitude of values in the replacement GRID corresponds to those found in the original (0-100).
The GRIDs of forest type and land cover (…/data_bin/forests/lower48, …/data_bin/lcov_comp/lulc) are categorical GRIDs and have VATs. Within the VATS of these GRIDs are numerical codes that represent landscape characteristics. For instance, the value “10” in …/data_bin/forests/lower48 represents aspen and birch trees. The standard GIS Weasel parameterization routines respond in a pre-determined manner to the code numbers found in the VATs of categorical GRIDs. If the code convention of the GRID …/data_bin/forests/lower48 no longer conforms to the standard, the GIS Weasel has no way of knowing this. For instance, if the value “10” now represents pine trees, the GIS Weasel will produce erroneous parameter values due to lack of a priori information about the code convention of the GRID.
In replacing the forest type and land use/land cover (categorical) GRIDs of the data_bin and establishing their code conventions, there are two alternatives:
- The user may attempt to mimic the standard code convention of the default GRID. If the user elects to do this, then they are referred to the metadata, accessible below, for the corresponding GRID for information on the standard coding convention.
- The second alternative is to use data_bin GRIDs with custom code conventions and replace the standard reclassification schemes that the GIS Weasel uses to derive parameters from data_bin GRIDs. The reclassification schemes can be considered the a priori information alluded to above. The schemes are contained in ASCII files called remap tables. Remap tables are formatted according to the ArcInfo specification for remap tables (see ArcInfo online help: Contents>>Spatial Modeling>>Cell-based Modeling with GRID>>Remap Tables in GRID). The files end with the suffix “.rmp” and are stored in the directory …/weasel/src_aml. The user should determine which GIS Weasel parameterization routines are going to be used and then check the GIS Weasel Parameterization Documentation for information on the relevant routines. The GIS Weasel Parameterization Documentation is accessible by using an HTML browser to examine the file …/weasel/src_doc/params/index.html. The user should note that the information in the GIS Weasel Parameterization Documentation is still being developed and should be considered a draft
There is no description of how to replace the soils data available at this time.
Use the default data_bin, found in the GIS Weasel home directory (“weasel”), as the template for the data_bin that you are creating.
Available Scripts
Several utility scripts are available for processing the data_bin. It should be noted that these are not supported and there is no error trapping to speak of. Caveat emptor. The user should examine the code (it’s simple stuff) prior to using. If others have code to contribute please send it in (rviger@usgs.gov).
The ArcInfo Macro Language (AML) script, data_bin-cut.aml, will extract an AOI from the national data sets found in the user-created national data_bin. The data_bin is described in the header of the data_bin-cut.aml. Links to most of the data sources for the lower 48 United States are provided below.
Another handy utility is data_bin-project.aml, which will convert the coordinates system of all the grids in the data_bin according to the details defined in a user-supplied projection parameter file (the coordinate system of your elevation and data_bin data should be consistent).
Sources of Nationally Available Data for the data_bin:
- Forest Land Distribution Data for the United States
Proceed to the USFS link above for more detailed information on the manufacture of these data and for data covering Alaska and Hawaii. Be aware that these layers are in a generic image file format, an will require a small amount of processing. Coming soon: an AML to ingest this data into an ArcInfo GRID format.
- Forest Type Group Data (tree species) for the lower 48 states of the US.
Please check the metadata for this layer for information on coordinate system, resolution and classification scheme.
- [note: this data is no longer available] Forest Density Data for the lower 48 states of the US.
Please check the metadata for this layer for information on coordinate system, resolution and classification scheme.
The GIS Weasel uses the vertically and horizontally averaged version of this database. The original can be obtained at (USDA - Soil Survey Division National STATSGO Database - Data Access page http://www.ftw.nrcs.usda.gov/stat_data.html). A preliminary version of an ArcInfo Macro Language (AML) script (soils_convert.aml) for processing the USDA-standard STATSGO and SSURGO into the format used by the GIS Weasel is now available. The user is encouraged to contact the authors with feedback (corresponding author: rviger@usgs.gov).
Please check the metadata for this layer for information on coordinate system, resolution and classification scheme. Information specific to Version 2 is soon to be posted in a “What’s New” section on the USGS Global Land Cover Characterization page.
Note that the …/data_bin/lcov_comp/lulc GRID used by the GIS Weasel is a composite of this layer with the Forest Type Group Data listed above.
Higher Resolution Data for the data_bin:
Note that the data referenced in this section may be used to replace the default data used by standard parameterization routines of the GIS Weasel. The default contents of the GIS Weasel data_bin are ArcInfo GRIDS that have a 1 kilometer cell size. These default data are suitable for application to large AOI. If you feel that you need higher accuracy for your data_bin, you can replace the default layers with higher resolution and accuracy data. The user is encouraged to contact us for more details (rviger@usgs.gov). These references are provided for general usefulness and support for using these layers will provided on a limited basis.
Because of this data set’s 30 meter resolution, it can be used to replace the default land cover data (GLCC). The extent of geographic coverage is limited and the data must be ordered from the USGS National Mapping Division (there’s a shipping cost).
Ancillary Data for the data_bin:
Note that this data is not part of the standard parameterization routines of the GIS Weasel. These references are provided for general usefulness.
- Oregon State University's PRISM Precipitation Modeling Page
The PRISM data sets are modeled (predicted) national precipitation surfaces for different times of the year. If you are responsible for things financial, please be aware that the PRISM funding has been cut and needs your support!
- unofficial USGS 1:250,000 scale quadrangle boundaries
- unofficial USGS 1:100,000 scale quadrangle boundaries
- unofficial USGS 1:24,000 scale quadrangle boundaries
- GIS Data for Water Resources
- USGS distribution of 1:2,000,00 Hydrologic Units (HUCs)
- USGS distribution of 1:250,00 Hydrologic Units (HUCs)
- USGS-EPA National Hydrography Dataset
- Realtime USGS Streamflow Stations
- 1:2,000,000-scale Streams of the United States
- 1:2,000,000-scale States of the United States
- 1:2,000,000-scale Counties of the United States
- 1:100,000-scale Counties of the United States
- Places of the United States - from Census TIGER files
Downloads for Classes:
|