Nutrition Project 2 Practice Exam with Answers (49 Solved Questions)

Study smarter with Nutrition Project 2 Practice Exam with Answers, using real past exam questions for effective revision.

Sophia Lee
Contributor
4.1
104
about 1 year ago
Preview (12 of 39 Pages)
100%
Log in to unlock

Page 1

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 1 preview image

Loading page ...

project02March 17, 20191Project 2: Diet and Disease. Due at 11:59pm on Sunday, 3/17.In this project, you will investigate the major causes of death in the world, as well as how one ofthese causes, heart disease, might be linked to diet!1.0.1LogisticsDeadline.This project is due at 11:59pm on Sunday, 3/17. It’smuchbetter to be early than late,so start working now.Free Response Questions: The free response questions and plots for the project areoptionaland ungraded, meaning you donotneed to submit a PDF of this notebook to Gradescope. Thesequestions tend to be open-ended.However, these questions will be very good practice for thefree-response exam questions, so do give them a good effort. Solutions will be posted after theproject late submission deadline.Partners.You may work with one other partner. Your partner must be enrolled in the samelab as you are. Only one of you is required to submit the project. On okpy.org, the person whosubmits should also designate their partner so that both of you receive credit.Rules.Don’t share your code with anybody but your partner. You are welcome to discussquestions with other students, but don’t share the answers. The experience of solving the prob-lems in this project will prepare you for exams (and life). If someone asks you for the answer,resist! Instead, you can demonstrate how you would solve a similar problem.Support.You are not alone! Come to office hours, post on Piazza, and talk to your classmates.If you want to ask about the details of your solution to a problem, make a private Piazza post andthe staff will respond.Tests.Passing the tests for a questiondoes notmean that you answered the question correctly.Tests usually only check that your table has the correct column labels. However, more tests willbe applied to verify the correctness of your submission in order to assign your final score, so becareful and check your work!Advice.Develop your answers incrementally. To perform a complicated table manipulation,break it up into steps, perform each step on a different line, give a new name to each result, andcheck that each intermediate result is what you expect.You can add any additional names orfunctions you want to the provided cells.All of the concepts necessary for this project are found in the textbook. If you are stuck ona particular problem, reading through the relevant textbook section often will help clarify theconcept.To get started, loaddatascience,numpy,pyplot, andok.1

Page 2

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 2 preview image

Loading page ...

Page 3

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 3 preview image

Loading page ...

In [1]:from datascience import*import numpy as np%matplotlibinlineimport matplotlib.pyplot as plotsplots.style.use('fivethirtyeight')from client.api.notebook importNotebookok = Notebook('project02.ok')_ = ok.auth(inline=True)=====================================================================Assignment: Project 2: Diet and DiseaseOK, version v1.13.11=====================================================================Successfully logged in as cat028@ucsd.edu2Diet and Cardiovascular DiseaseDeath and its many causes are often a disconcerting topic for polite conversation. However, themore we know about it, the better equipped we are to prevent our early demise. As the acclaimedProfessor Albus Dumbledore once said, “After all, to the well-organized mind, death is but thenext great adventure.”In the following analysis, we will investigate the world’s most dangerous killer: Cardiovascu-lar Disease. Your investigation will take you across decades of medical research, and you’ll lookat multiple causes and effects across two different studies.Here is a roadmap for this project:• In Part 1, we’ll investigate the major causes of death in the world during the past century(from 1900 to 2015).• In Part 2, we’ll look at data from the Framingham Heart Study, an observational study intocardiovascular health.• In Part 3, we’ll examine the clinical trials from the Minnesota Coronary Experiment andintroduce our second dataset.• In Part 4, we’ll run a hypothesis test on our observed data from the Minnesota CoronaryExperiment.• In Part 5, we’ll conclude the experiment and reflect on what we’ve learned about the rela-tionship between diet and cardiovascular disease.2.1Part 1: Causes of DeathIn order to get a better idea of how we can most effectively prevent deaths, we need to firstfigure out what the major causes of death are.Run the following cell to read in and view thecauses_of_deathtable, which documents the death rate for major causes of deaths over the lastcentury (1900 until 2015).2

Page 4

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 4 preview image

Loading page ...

In [2]: causes_of_death = Table.read_table('causes_of_death.csv')causes_of_death.show(5)<IPython.core.display.HTML object>Each entry in the columnAge Adjusted Death Rateis a death rate for a specificYearandCauseof death.TheAge Adjustedspecification in the death rate column tells us that the values shown are thedeath rates that would have existed if the population under study in a specific year had the sameage distribution as the “standard” population, a baseline. This is so we can compare ages acrossyears without worrying about changes in the demographics of our population.Question 1:What are all the different causes of death in this dataset? Assign an array of allthe unique causes of death toall_unique_causes.In [3]: all_unique_causes = causes_of_death.group('Cause').column('Cause')sorted(all_unique_causes)# This line displays your array in alphabetical orderOut[3]: ['Accidents','Cancer','Heart Disease','Influenza and Pneumonia','Stroke']In [4]: _ = ok.grade('q1_1')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedQuestion 2:We would like to plot the death rate for each disease over time. To do so, we mustcreate a table with one column for each cause and one row for each year.Create a table calledcauses_for_plotting. It should have one column calledYear, and thena column with age-adjusted death rates for each of the causes you found in Question 1. Thereshould be as many of these columns incauses_for_plottingas there are causes in Question 1.Hint: Usepivot, and think about how theelemfunction might be useful in getting theAgeAdjusted Death Ratefor each cause and year combination.In [5]:defelem(x):returnx.item(0)In [6]: causes_for_plotting = causes_of_death.pivot('Cause','Year', values='Age Adjusted Deathcauses_for_plotting.plot('Year')# Do not change this line3

Page 5

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 5 preview image

Loading page ...

Let’s examine the graph above. You’ll see that in the 1960s, the death rate due to heart diseasesteadily declines. Up until then, the effects of smoking, blood pressure, and diet on the cardiovas-cular system were unknown to researchers. Once these factors started to be noticed, doctors wereable recommend a lifestyle change for at-risk patients to prevent heart attacks and heart problems.Note, however, that the death rate for heart disease is still higher than the death rates of allother causes. Even though the death rate is starkly decreasing, there’s still a lot we don’t under-stand about the causes (both direct and indirect) of heart disease.2.2Part 2: The Framingham Heart StudyThe Framingham Heart Study is an observational study of cardiovascular health. The initial studyfollowed over 5,000 volunteers for several decades, and followup studies even looked at theirdescendants. In this section, we’ll investigate some of its key findings about diet, cholesterol, andheart disease.Run the cell below to examine data for almost 4,000 subjects from the first wave of the study,collected in 1956.In [7]: framingham = Table.read_table('framingham.csv')framinghamOut[7]: AGE| SYSBP | DIABP | TOTCHOL | CURSMOKE | DIABETES | GLUCOSE | DEATH | ANYCHD39| 106| 70| 195| 0| 0| 77| 0| 146| 121| 81| 250| 0| 0| 76| 0| 048| 127.5 | 80| 245| 1| 0| 70| 0| 061| 150| 95| 225| 1| 0| 103| 1| 046| 130| 84| 285| 1| 0| 85| 0| 043| 180| 110| 228| 0| 0| 99| 0| 163| 138| 71| 205| 0| 0| 85| 0| 145| 100| 71| 313| 1| 0| 78| 0| 052| 141.5 | 89| 260| 0| 0| 79| 0| 043| 162| 107| 225| 1| 0| 88| 0| 0... (3832 rows omitted)4

Page 6

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 6 preview image

Loading page ...

Each row contains data from one subject. The first seven columns describe the subject at thetime of their initial medical exam at the start of the study. The last column,ANYCHD, tells us whetherthe subject developed some form of heart disease at any point after the start of the study.You may have noticed that the table contains fewer rows than subjects in the original study:this is because we are excluding subjects who already had heart disease as well as subjects withmissing data.2.2.1Section 1: Diabetes and the populationBefore we begin our investigation into cholesterol, we’ll first look at some limitations of thisdataset. In particular, we will investigate ways in which this is or isn’t a representative sampleof the population by examining the number of subjects with diabetes.According to the CDC, the prevalence of diagnosed diabetes (i.e., the percentage of the popu-lation who have it) in the U.S. around this time was 0.93%. We are going to conduct a hypothesistest with the following null and alternative hypotheses:Null Hypothesis: The probability that a participant within the Framingham Study has dia-betes is equivalent to the prevalence of diagnosed diabetes within the population. (i.e., any differ-ence is due to chance).Alternative Hypothesis: The probability that a participant within the Framingham Study hasdiabetes is different than the prevalence of diagnosed diabetes within the population.We are going to use the absolute distance between the observed prevalence and the true pop-ulation prevalence as our test statistic. The columnDIABETESin theframinghamtable contains a 1for subjects with diabetes and a0for those without.Question 1: What is the observed value of the statistic in the data from the Framingham Study?You should convert prevalences to proportions before calculating the statistic! (re-read the ques-tion if you are not sure what it is asking)In [8]: obs_db = sum(framingham.column('DIABETES'))/framingham.num_rowsobserved_diabetes_distance = abs(obs_db - 0.0093)observed_diabetes_distanceOut[8]: 0.018029515877147319In [9]: _ = ok.grade('q2_1_1')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedQuestion 2:The arraydiabetes_proportionscontains the proportions of the populationwithout and with diabetes.Complete the following code to simulate 5000 values of the statis-tic under the null hypothesis.5

Page 7

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 7 preview image

Loading page ...

In [10]: diabetes_proportions = make_array(.9907, .0093)diabetes_simulated_stats = make_array()repetitions = 5000foriinnp.arange(repetitions):simulated_stat = sample_proportions(4000, diabetes_proportions).item(1)diabetes_simulated_stats = np.append(diabetes_simulated_stats, simulated_stat)diabetes_simulated_statsOut[10]: array([ 0.00825,0.0085 ,0.00975, ...,0.0075 ,0.0075 ,0.00975])In [11]: _ = ok.grade('q2_1_2')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedQuestion 3: Run the following cell to generate a histogram of the simulated values of yourstatistic, along with the observed value.Make sure to run the cell that draws the histogram, since it will be graded.In [12]: Table().with_column('Simulated distance to true incidence', diabetes_simulated_stats).plots.scatter(observed_diabetes_distance, 0, color='red', s=30)Out[12]: <matplotlib.collections.PathCollection at 0x7f4541e6eda0>6

Page 8

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 8 preview image

Loading page ...

Question 4: Based on the results of the test and the empirical distribution of the test statisticunder the null, should you reject the null hypothesis?Yes, the observed value is very different than the simulated values.Question 5:You know that the study was well-designed to represent the population.Why might there be a difference between the population and the sample?Assign the nameframingham_diabetes_explanationsto a list of the following explanations thatare possible andconsistentwith the observed data and hypothesis test results.1. Diabetes was under-diagnosed in the population (i.e., there were a lot of people in the pop-ulation who had diabetes but weren’t diagnosed).2. Healthy (non-diabetic) people are more likely to volunteer for the study.3. The relatively wealthy population in Framingham ate a luxurious diet high in sugar (high-sugar diets are a known cause of diabetes).4. The Framingham Study subjects were older on average than the general population, andtherefore more likely to have diabetes.In [13]: framingham_diabetes_possibilities = [1,3,4]framingham_diabetes_possibilitiesOut[13]: [1, 3, 4]In [14]: _ = ok.grade('q2_1_5')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests7

Page 9

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 9 preview image

Loading page ...

---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedIn real-world studies, getting a truly representative random sample of the population is oftenincredibly difficult. Even just to accurately represent all Americans, a truly random sample wouldneed to examine people across geographical, socioeconomic, community, and class lines (just toname a few). For a study like this, scientists would also need to make sure the medical exams werestandardized and consistent across the different people being examined. In other words, there’sa tradeoff between taking a more representative random sample and the cost of collecting all thedata from the sample.The Framingham study collected high-quality medical data from its subjects, even if the sub-jects may not be a perfect representation of the population of all Americans. This is a commonissue that data scientists face: while the available data aren’t perfect, they’re the best we have. TheFramingham study is generally considered the best in its class, so we’ll continue working with itwhile keeping its limitations in mind.(For more on representation in medical study samples, you can read these recent articles fromNPR and Scientific American).2.2.2Section 2: Cholesterol and Heart DiseaseNext, we are going to examine one of the main findings of the Framingham study: an associationbetween serum cholesterol (i.e., how much cholesterol is in someone’s blood) and whether or notthat person develops heart disease.We’ll use the following null and alternative hypotheses:Null Hypothesis:In the population, the distribution of cholesterol levels among those whoget heart disease is the same as the distribution of cholesterol levels among those who do not.Alternative Hypothesis:The cholesterol levels of people in the population who get heart dis-ease are higher, on average, than the cholesterol level of people who do not.Question1:Fromtheprovided NullandAlternativeHypotheses,whatseemsmorereasonable to use,A/B Testing or the Standard Hypothesis Testing?Assign the variablereasonable_testto one of the following choices.1. A/B Testing2. Standard Hypothesis TestIn [15]: reasonable_test = 1#A/B testing: two numerical samples come from the same underlyingreasonable_testOut[15]: 1In [16]: _ = ok.grade('q2_2_1')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests8

Page 10

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 10 preview image

Loading page ...

---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedQuestion 2:Now that we have a null hypothesis, we need a test statistic. Explain and justifyyour choice of test statistic in two sentences or less.Hint: Remember that larger values of the test statistic should favor the alternative over thenull.Test Statistic:Since ‘ANYCHD’ does not have an equal proportion of 0’s to 1’s, I chose my teststatistic to be the absolute difference of the means between the cholesterol levels of people with and withoutheart disease.Question 3: Write a function that computes your test statistic. It should take a table with twocolumns,TOTCHOLandANYCHD, and compute the test statistic you described above.Hint: think about what doesANYCHDmean in the tableIn [17]:defcompute_framingham_test_statistic(tbl):#f_test_stat1 = np.mean(abs(tbl.where('ANYCHD', 1).column('TOTCHOL') - tbl.where('ch_w_hd = tbl.where('ANYCHD', 1).column('TOTCHOL')ch_wo_hd = tbl.where('ANYCHD', 0).column('TOTCHOL')#f_test_stat2 = sum(np.abs(ch_w_hd - ch_wo_hd))/2 #total variation distance#can't do f_test_stat 1 or 2 bc ch_w_hd and ch_wo_hd have diff lengthsf_test_stat = np.mean(ch_w_hd) - np.mean(ch_wo_hd)#why no abs?returnf_test_statIn [18]: _ = ok.grade('q2_2_3')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedQuestion 4: Use the function you defined above to compute the observed test statistic, andassign it to the nameframingham_observed_statistic.In [ ]: framingham_observed_statistic = (compute_framingham_test_statistic(framingham))framingham_observed_statisticOut[ ]: 16.6359199056894069

Page 11

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 11 preview image

Loading page ...

In [ ]: _ = ok.grade('q2_2_4')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedNow that we have defined hypotheses and a test statistic, we are ready to conduct a hypothesistest. We’ll start by defining a function to simulate the test statistic under the null hypothesis, andthen use that function 1000 times to understand the distribution under the null hypothesis.Question 5: Write a function to simulate the test statistic under the null hypothesis.Thesimulate_framingham_nullfunction should simulate the null hypothesis once (not 1000times) and return the value of the test statistic for that simulated sample.Hint:* Simulate a new sample (should it be with replacement or not?) * Think about what doyou want to change and what do you want to keep the same when you do resamplingIn [ ]:defsimulate_framingham_null():shuffled_frame = framingham.sample(with_replacement=False).column('TOTCHOL')#checksim_table_frame = framingham.with_column('TOTCHOL', shuffled_frame)returncompute_framingham_test_statistic(sim_table_frame)In [ ]:# Run your function once to make sure that it works.simulate_framingham_null()Out[ ]: -1.3361179346910887In [ ]: _ = ok.grade('q2_2_5')~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Running tests---------------------------------------------------------------------Test summaryPassed: 1Failed: 0[ooooooooook] 100.0% passedQuestion 6: Fill in the blanks below to complete the simulation for the hypothesis test. Yoursimulation should compute 1000 values of the test statistic under the null hypothesis and store theresult in the array framingham_simulated_stats.Hint: You should use the function you wrote above in Question 3.Note: Warning: running your code might take a few minutes! We encourage you to check yoursimulate_framingham_null()code to make sure it works correctly before running this cell.10

Page 12

Nutrition Project 2 Practice Exam with Answers (49 Solved Questions) - Page 12 preview image

Loading page ...

In [ ]: framingham_simulated_stats = make_array()repetitions = 1000foriinrange(repetitions):sim_stat = simulate_framingham_null()framingham_simulated_stats = np.append(framingham_simulated_stats, sim_stat)The following line will plot the histogram of the simulated test statistics, as well as a point forthe observed test statistic.In [ ]: Table().with_column('Simulated statistics', framingham_simulated_stats).hist()plots.scatter(framingham_observed_statistic, 0, color='red', s=30);Question 7:Compute the p-value for this hypothesis test,and assign it to the nameframingham_p_value.Hint: One of the key findings of the Framingham study was a strong association betweencholesterol levels and heart disease. If your p-value doesn’t match up with this finding, you maywant to take another look at your test statistic and/or your simulation.In [ ]: framingham_p_value = np.count_nonzero(framingham_simulated_stats >= framingham_observedframingham_p_valueOut[ ]: 0.0In [ ]: _ = ok.grade('q2_2_7')11
Preview Mode

This document has 39 pages. Sign in to access the full document!