|
|
 |
Instructions for Using R-project Software for Analyzing Industrial Hygiene Data
Including Data with Non-Detects and Non-parametric Data
- Introduction: These are instructions for generating
industrial hygiene metrics using R routines and
MS Excel for Windows. MS Excel can be used to clean-up
data, group the data, create text files for analysis
and create tables and charts for final reports.
Any word processing, spreadsheet, or database software
could be used for creating the text file. The Statistical
Analysis of Non-Detects (sand) package contains
R routines that read text files and generate an
output file containing metrics. The metrics generated
are those recommended by the AIHA /sup> and
are interpreted and used as described in the book.
- You should have already gone to http://www.csm.ornl.gov/esh/statoed/
and followed the instruction for installing R and
the “sand” package. Now you can use R to analyze
data in a text file. The text file must have at
least two columns. One column contains a value and
the second a 0 or 1 for non-detect or detect. Each
column must have a one-word heading. The following
describes one way to create the text file using
MS Excel.
1Ignacio, J.S., and W.H. Bullock:
A Strategy for Assessing and Managing Occupational
Exposures, 3rd ed. Fairfax, Va.: AIHA Press, 2006.
- This example uses data from Table IV.3 of the
AIHA book and is also the example data in the file
aihand.txt included in the download from the “statoed”
web site. The first column “Monitoring Data (mg/m3)”
contains a mix of values and the text symbol “<”.
The “Value” and “Detected 0=No 1=Yes” columns were
created using a variety of Excel text editing and
logic functions. For example the Excel logic function
=IF(LEFT(A2,1)="<",VALUE(REPLACE(A2,1,1,0)),A2)
removes the “<” symbol when it appears. While
not important for a small file like this, these
editing functions are very helpful when cleaning-up
large data sets.
- Columns F and G contain the values and flags that
you need to convert to a text file. Copy the two
columns; open a new file, select “paste special”,
and “values”.
- In this case Column B has a 3 word heading “Detected
0=No 1=Yes” and this has to be changed to a one
word heading such as “Detected” or “Flag”. The next
step is to close the file and save in the rmain
folder as a tab delimited text file. Click through
the screens warning of the loss of formatting etc.
- Open the R console by double clicking on the icon
in the “rmain” folder you created. Type in the command
“aihand<-readss("aihand",L=5)”. The
L=5 is the value of the OEL being used to interpret
data. Hit “Enter” and the file is read and analyzed.
Once you have typed in commands, the up and down
arrows toggle through the commands you have used.
If you “Save Workspace Image” at the end of the
R session, the commands will be saved. With a large
dataset, one often will group data into subsets
based on some variable (location, time, individual,
etc) and create several text files for analysis.
Use the up arrow, edit the file name command line
(i.e. Bldg1<-readss("Bldg1",L=5), Bldg2<-readss("Bldg2",L=5),
etc.) When the analyses are complete the prompt
returns.
- R creates a new comma delimited text file “aihandout.csv”
that contains the metrics. This can be opened with
MS Excel and the two columns can be copied and pasted
into spread sheet you will be using for further
analysis and report writing.
- The “readss” command generates the following metrics.
The industrial hygienist chooses those that help
interpret the data. Mean and confidence intervals
are useful for decisions on exposure groups and
constructing job and exposure matrices. Upper tolerance
limits and percent exceedance are useful for determining
compliance and other day-to-day risk management
decisions. Parametric and non-parametric versions
of each are included.
| Label |
Metric |
Glossary |
| mu |
0.925 |
Maximum likelihood estimate (MLE)
of mean of the log transformed data (log of
GM) |
| se.mu |
0.099 |
Estimate of the standard error
of mu |
| sigma |
0.37 |
MLE of standard deviation of log
transformed data (log of GSD) |
| se.sigma |
0.079 |
Estimate of standard error of
sigma |
| GM |
2.522 |
MLE of geometric mean |
| GSD |
1.447 |
MLE of geometric standard deviation |
| EX |
2.7 |
MLE of the EX the (arithmetic)
mean |
| LCLa_95 |
2.26 |
95% Lower Confidence Limit (LCL)
for EX |
| UCLa_95 |
3.226 |
95% Lower Confidence Limit (LCL)
for EX |
| KMmean |
2.773 |
Kaplan-Meier (KM) Estimate of
EX |
| KM.LCL |
2.29 |
95% LCL for KM EX |
| KM.LCL |
3.257 |
95% UCL for KM EX |
| KM.se |
0.269 |
Standard Error of KMmean |
| Xp.obs |
4.75 |
Observed 95th Percentile
of data |
| Xp |
4.633 |
MLE of 95th Percentile |
| Xp.LCL |
3.521 |
MLE of 95% LCL for Xp |
| Xp.UCL |
6.096 |
MLE of the 95% Upper Tolerance
Limit (UTL) of Xp |
| NpUTL |
NA |
Nonparmetric estimate of the 95%
UTL of Xp. |
| Maximum |
5.5 |
Largest value in data set |
| nonDet% |
20 |
The percent of Xs that are left
censored |
| n |
15 |
The number of observations in
the data set |
| Rsq |
0.969 |
Square of correlation for the
data and standard log normal |
| m |
12 |
The number of detected Xs |
| f |
3.208 |
MLE of the percent exceeding the
specified limit L |
| f.LCL |
0.396 |
MLE of 95% LCL for f |
| f.UCL |
14.767 |
MLE of 95% LCL for f |
| fnp |
6.667 |
Nonparametric estimate of f for
limit L |
| fnp.LCL |
0.341 |
Nonparametric estimate of 95%
LCL for f |
| FnUCL_95 |
27.94 |
Nonparametric estimate of 95%
LCL for f |
| m2logL |
41.3044 |
-2 times the log-likelihood function |
| L |
5 |
L is specified limit for the percent
exceeding; e.g., the OEL |
| P |
0.95 |
percentile for UTL p-gamma |
| gam |
0.95 |
one-sided confidence level gamma.
Default is 0.95 |
- R will generate a log probability plot (also
called a Q-Q Plot) that provides a visual check
of whether the data fits the lognormal model (see
the AIHA book.) Creating a log probability plot
requires two commands. The first command, pnd<-plend(aihand),
creates a data frame. The second, qq.lnorm(pnd),
generates a probability plot as a visual check of
log-normality. This plot displays only detected
values and displays replicates as a single data
point, which aids the visual check when the data
set is large. Clicking the camera button copies
the image so that it can be pasted into Excel or
another document.>
- Once the metrics and Q-Q plot have been copied
into your spreadsheet, you can continue using them
to generate charts and tables needed to support
your data analysis and reporting.
- The metrics calculated by the “readss” command
can also be calculated separately. The commands
for these are described in the help menu, which
is shown when you type “help(sand).” The “readss”
routine requires at least 3 detected results to
run. One function that can be used with all non-detect
is “nptl(n , p = 0.95, gam = 0.95)”, which provides
the order of the value in a data set with n values
that corresponds to non-parametric upper tolerance
limit for specified percentile and upper tolerance.
|
This page was last updated on
December 10, 2012
|