Below are some things to think about as you choose a data set and question.
- What is the ideal data for my question?
- What actual data is out there that I can get?
- If your searching doesn’t find much, ask Prof Fletcher, Andy (PLA) and/or research librarians for help
- What will the limitations on my hypotheses be if I don’t have the ideal data?
- Should I choose something else?
Once you have chosen your data, also consider these questions:
- Can I get the data in to Stata?
- Do the data make sense?
- Check for missing observations, strange values for means, variances, minima or maxima, etc.
- Do I know what the variables mean? (Have I read the meta-data supporting the files such as documentation and codebooks?)
- If I have to join two datasets, have I matched up observations properly?