Tips for Writing up your Project

TIPS FOR WRITING QM222 (A1) PROJECTS

I have found myself writing the following comments to many students regarding their draft, so I am sharing this with you:

The project needs a specific client company or organization that you are addressing.
After you finish your next draft, do what I have told everyone to do: Start reading the paper from beginning to end as if you are the client and KNOW NOTHING about what the paper is about. Make sure that a person who knew nothing before he/she picked up the report will understand exactly what you have done and what they can learn from it by the time they finish reading it.
The project itself should start on a new page, with the title of the project on top. It should assume that the reader has not read the executive summary and therefore it should have all the introductory material. Do not refer to the executive summary in the report itself. (In fact, it is best to write the executive summary AFTER you finish the project. You will probably repeat sentences or even paragraphs in the project and the executive summary.)
Organizationally, you should probably have sections for your paper for instance: Introduction, Data Description, Results (or Detailed Statistical Analysis), Conclusion. Within Results you should have sub-sections (each with a title). Use introductory sentences in each section and subsection so the reader knows what that section does.
Most people need far more of an introduction about what this research addresses, how it addresses it and in what ways it might possibly be useful for the client. Some people summarize findings in their introduction, but it is not necessary since there is an executive summary.
In the data section, it might be good to discuss descriptive statistics, particularly things like the distribution of Y and of your key X, the averages of your variables etc. For averages, refer to a table of these descriptive statistics (which I suggest you make from the sum table, but put it in Excel and make it look good and give it a title.)
Everyone should include a cite (e.g. website URL) for your data.
Most people do not include enough details about the data set (e.g. Who is in the sample you use? What is the survey question that the key variables are based on?)
In your data section, it is good to introduce the methodology (regression) you use, and how in multiple regression each coefficient holds all the other things constant.
If your dependent variable is a dummy, explain how to interpret the coefficient (as percentage point differences in Y). … You might also add that since the average Y is __%, the percentage change in Y is around ____ times the coefficient. g. if average Y is .5, the average percentage change is twice the coefficient. (Then continue to make this percentage point/percentage differentiation when talking about results.)
Many people could add value by elaborating on the coefficients. For instance, some people find it a useful addition to say something like: “The difference between the lowest and highest X value for this variable is ____ and can change Y by as much as” ….. [here multiple the coefficient times the max-min range]“
All tables and graphs should have titles that tell the reader exactly what they are just by looking at the table. Tables and graphs should be numbered and referred to in the text by the number. All variable names in the tables should be intuitive, so the reader knows exactly what they are without referring to a different table. BE NICE TO YOUR READER/CLIENT or you won’t get hired again.
Do not use lots of decimals, even in your regression table. People’s eyes glaze over when they se 87.345972…. 3 would be better in the text, and 87.34 (4 significant digits) in the table.
In regression tables: If you have multiple category dummies, regression tables should have a footnote that lists what categories are excluded. A footnote should also tell us if it is t-stats or standard error in parentheses.
If at all possible, the regression table should be no longer than 1 page long and should all be included on one page, whether within the body of the paper or at the end of the paper.
Do not include a table in the text until after your refer to it. Tables in the appendix should be in the order they are referred to in the paper.
You need to define the variables as you use them so the reader does not have to keep going back and forth to the variable definition table.
Everyone needs to make a regression table. You should put all regressions in a single table, as I’ve shown in class.
Throughout the paper, discuss (adj) R-squared as “the percentage of the variation in ___(Y) explained by the variable(s) in the regression.”
When you report statistics’ significance, it is more helpful if instead of saying “with more than 95% (or 68%) certainty,” you can be more specific. For instance, you could say that you know with 75% certainty, or with more than 99% certainty, that X increases Y.
Quadratics: When you have a linear and a squared variable, you cannot interpret the linear term’s coefficient only as its effect (for instance age squared.) Instead, both the linear and quadratic terms must be combined to figure out the effect. In general, if the quadratic is b X + a X² , the best way to calculate the effect of a change in X is to do the derivative (or slope) which is b + 2*a*X. As you see, the slope depends on the specific value of X (e.g. age) you are at. Moreover, you cannot use the t-stats of the two different terms to measure the significance of the impact of the variable – let’s say age. Instead, you need to measure the joint significance at a specific age, let’s say the age of 30. To do this, after the regression, type lincom age+agesq*2*30 (which is the slope at the age of 30).
More on Quadratics: A good way to illustrate the quadratic is to draw a graph so you can see the shape of the relationship. Use the range of X’s (e.g. age) in your sample. But many of you are confused about the Y-axis since we are only including the contribution of these two terms (to Y). I have a way to solve this. Let’s consider the variable age and assume that the youngest people in your dataset are 18. Start the Y value at the average Y of 18 year olds in your sample. Let’s say that the Y value at age 18 is 100. Then, make the rest of the curve change from there (by adding 100 to all predicted effects of age at the different ages.)
Linear probability models: The coefficients in a linear probability model aren’t percents, they are percentage points. For instance, let’s say your Y variable is the probability of depression which averages .25 in your sample. Further, let’s say that the coefficient on female is .10. This is .10 percentage points. What percent increase is this? .10/.25, or 40% increase.