Statistics in Stata


1. Let us return to Flint. Public health officials say that 5ppb is considered an “acceptable” level of lead in the water. Yet, health researchers stress there is no safe level of lead, only what policyholders consider “acceptable”. In the exam review, we mentioned a study by researchers from Virginia Tech, who received water samples from 271 homes.

The raw data from this study is up on Blackboard. Compute summary statistics in Stata and then manually test the hypothesis that Flint homes have lead levels that are significantly higher than the “acceptable” levels. Choose the appropriate t-test, alpha, and 1 or 2-tailed test, and show all steps of the hypothesis testing process. Present your findings in a format acceptable for publication. The citation for the study is: (2015) “Lead Results from Tap Water Sampling in Flint, MI during the Flint Water Crisis.”

2. How do we make sense of the willingness to risk the health of the public, and especially children? A rich body of critical theory (from critical class theory to critical race theory) points out that the poorest or most disadvantaged people often pay the steepest cost for government neglect. This is for a range of reasons, from such people not having access to the political or legal system to a disregard for the poor from those in positions of power. Indeed, Flint ranks toward the bottom of the state of Michigan in rates of childhood poverty (42% in Flint versus 16.2% in Michigan and 14.8% in the United States as a whole).

Can we make use of such a critical lens to understand the disparate impact even within the town of Flint? Flint is made up of 9 wards. Although the town as a whole is very poor, wards 6 and 7 in particular have seen high increases in unemployment and poverty rates and an increase in neglected properties. Use the dataset on lead levels in the 271 homes to determine if we can statistically say that homes in these two wards have higher lead contamination than in the other wards. Calculate summary statistics in Stata and use those figures to manually carry out the relevant hypothesis testing. Make all steps of this process explicit and present your results in a detailed, coherent narrative.

To simplify the calculations, use the following two codes, which calculate lead levels in wards 6 and 7 together, and in all other wards respectively. (Note ~= in Stata means “not equal” and | is the symbol for “or”.)
sum ppb if ward==6 | ward==7
sum ppb if ward~=6 | ward~=7


3. In question 1 you determined whether on not the lead levels in Flint were statistically higher than the “acceptable” levels of 5 ppb. State officials in Michigan may argue that this is due to outliers, that is, a few homes that have very high levels that are skewing the distribution (i.e. a positive/right skew). Compute the mean and standard deviation of ppb for respondents in that survey using Stata. Using what we know about z scores and the normal distribution:
a) Estimate the percentage of homes with ppb levels greater than the acceptable levels.
b) Public officials argue that, because of the positive skew in the distribution of lead levels in most towns, the important ppb level to consider when determining if levels are a public health problem is the 90th percentile. Based on your estimate in part a, are lead levels in Flint still a public health issue? Why?