## Lesson Summary

Summary

In this lesson, students will learn how to acquire and analyze data to find answers to questions and solutions to problems. Students will consider whether or not the data they are presented with is necessarily valid, and research some of the various data sources online.

Outcome

• Students will explore how computation can be employed to help people process data and information to gain insight and knowledge.
• Students will learn how computation can be used to facilitate exploration and discovery when working with data.
• Students will consider what considerations and trade-offs arise in the computational manipulation of data.
• Students will identify evidence that digitally processed data can show a correlation between variables and that a correlation found in data does not necessarily indicate that a causal relationship exists.
• Students will explain how a single source does not contain the data needed to draw a conclusion. It may be necessary to combine data from a variety of sources to formulate a conclusion.
• Students will explore opportunities that large data sets provide for solving problems and creating knowledge.

Overview

Session 1

1. Getting Started (5 min) - Students journal on the importance of validating data
2. Discuss journal prompt (5 min)
3. Brainstorm types of online data (5 min)
4. Explore how meaning is created from data (10 min)
5. Work with data online (20 min)
6. Assign homework (5 min)

Session 2

1. Getting Started (5 min) - Students journal about what they learned the previous day
2. Analyzing Data (30 min) - Discuss correlation and causation; discuss and explore different types of data and analysis
3. Present homework findings (10 min)
4. Wrap Up - Journal (5 min)

## CSP Objectives

• EU DAT-1 - The way a computer represents data internally is different from the way the data is interpreted and displayed for the user. Programs are used to translate data into a representation more easily understood by people.
• LO DAT-1.A - Explain how data can be represented using bits.
• LO DAT-1.D - Compare data compression algorithms to determine which is best in a particular context.
• EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
• LO DAT-2.A - Describe what information can be extracted from data.
• LO DAT-2.C: - Identify the challenges associated with processing data.
• LO DAT-2.D - Extract information from data using a program.
• LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
• EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
• LO AAP-2.A - Express an algorithm that uses sequencing without using a programming language.
• LO AAP-2.L - Compare multiple algorithms to determine if they yield the same side effect or result.

## Key Concepts

Students will be able to acquire data and analyze it to find answers to a specific question or solutions for a specific problem.

## Essential Questions

• What opportunities do large data sets provide for solving problems and creating knowledge?

## Teacher Resources

Student computer usage for this lesson is: required

Student computer usage for second lesson is: optional

## Teacher Resources

In the Lesson Resources Folder

• PowerPoints: "Finding Data" and "Finding and Analyzing Data"
• Session 1 Homework: "Homework Unit 4 Lesson 1"

Webpages Session 1

Webpages Session 2

# Session 1

For this session, use the presentation "Finding Data" in the Lesson Resources Folder.

# Getting Started (5 min)

Given this data: [slide 1]

A blood drive at the local high school reveals that 20% of the students were HIV positive.

Journal on these questions:

• What is your immediate reaction?
• What questions do you have?

# Activities (40 minutes)

## Activity 1 (5 min) - Discuss the journal prompt

Lead the students in discussion using the bullets below and slide 2 of the PowerPoint as guidance. Students should talk about WHY they assumed the data was true or were uncomfortable questioning the truth of the data.

## Activity 2 (5 min) - Brainstorm: What kinds of data can be found online?

Part 1 - Discussion

Data comes from many places and takes many forms [slide 3]

• Have students discuss: How do business, personal, government and devices create and use data?
• Do computers perceive and store data in the same way that humans do?
• Additional research is needed to understand the exact nature of the relationship.

Part 2 - Brainstorm

Brainstorm as a class: what kinds of data are generated? Possible answers:

• video: movies, webcam images, CCTV, youTube, Netflix, Facebook, etc.
• pictures: maps, Instagram, photos, cartoons, drawings, …. everything!
• words: books, articles, news, stories, blogs, Facebook
• numbers: facts, financial transactions, scientific data
• sound: music, speech
• behavior tracking: GPS, click behavior, search history
• IMPORTANT POINT: Compters see and record digital data which is only an approximation of the real world. The sample rate determines the accuracy of the digital approximation. The real world is analog. Analog data have values that change smoothly, rather than in discrete intervals, over time. Some examples of analog data include pitch and volume of music, colors of a painting, or position of a sprinter during a race.

## Activity 3: How is meaning created from data? (10 minutes)

1. Look at some data gathered about selfies from different cities around the world. [slide 4]
• Main ideas:
• You have to gather the data and analyze it to create meaning.
• Creating meaning from pictures still takes some human interpretation.
• Digitally processed data can show a correlation between variables and a correlation found in data does not necessarily indicate that a causal relationship exists.
• A single source does not contain the data needed to draw a conclusion. It may be necessary to combine data from a variety of sources to formulate a conclusion.
• Prompt students to come to a conclusion about the graphed data on the page.
• Question for discussion: How large of a sample is needed to draw a conclusion?
2. Quick review: Make the point that there is a LOT of data even in a single picture. [slide 5]
1. Define these and put them in order. Use this webpage to review bytes: http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html
• MB, bit, TB, ZB, byte, GB, pixel (one dot of color on the screen), KB, PB
2. Look at the photo on slide 5.
1. 365 gigapixels is 365 billion pixels, if the picture is a square, then it is 604,152 pixels on each side (too big to fit on any HDTV screen)
2. http://www.rtings.com/info/what-is-the-resolution  A 4K super high resolution TV is only about 3,000 X 2,000 pixels. Even a movie screen can’t show all of the detail!
3. https://www.amctheatres.com/sony4k, you can only look at it one part at a time.
3. Preview Wolfram Alpha, an engine for providing knowledge from data.
• Show the introductory video: https://www.wolframalpha.com/tour/what-is-wolframalpha.html (1:18) [slide 6]
• Identify ways that patterns can emerge when data are transformed using programs.
• Experiment to demonstrate how insight and knowledge can be obtained from translating and transforming digitally represented information.
• Students will explore these sites in the next activity.
4. Point out that there are processes that can be used to extract or modify information from data in both bebeficial and harmful ways.  These processes include the following:
●       machine learning and data mining

●       transforming every element of a data set, such as doubling every element in a list, or extracting the parent’s email from every student record

●       filtering a data set, such as keeping only the positive numbers from a list, or keeping only students who signed up for band from a record of all the student
●       combining or comparing data in some way, such as adding up a list of numbers, or finding the student who has the highest GPA
●       visualizing a data set through a chart, graph, or other visual representation

## Activity 4 (20 min) - Work with some data online

1. Students should complete the Data Search and Analysis Handout. [slide 7]
• Depending on how much time you have, you can pair students and assign even/odd questions or chunks of questions to different groups, or have each student research on their own.
2. If there’s time in class, try to go over results and compare (especially the first 5) to see if people got similar answers. Why or why not? [slide 8]

# Assign Homework (5 minutes)

Give students the worksheet: Homework Unit 4 Lesson 1.

There are 10 videos to choose from, each 10-15 minutes long. Either allow students to self-select, or assign them a particular video. Students should watch the video and answer the questions on the worksheet. This is an opportunity to discuss plagiarism: students are expected to watch the video and write from their own experience.

# Session 2

For this session, use the presentation: Finding and Analyzing Data from the Lesson Resources Folder

# Getting Started (5 min)

Students should journal on the following:  Describe at least 2 ways that we create meaning out of data. [slide 1]

• Possible answers: graph it, total it, average it, find min and max, map it, compare it to other data, find trends, generate predictions (like weather), draw conclusions (facial recognition, emotions, voice inflection), diagnose diseases, discover new stars, etc.

# Activities (40 min)

## Activity 1 (35 min): Analyzing Data

Part 1: Correlation vs. Causation

1. Look at slide 2 from the PowerPoint. Creating meaning from data can be misleading.
2. Point out that the graph shows a direct relationship between the number of divorces in Maine and the amount of margarine that is purchased. When one goes up, the other does too, and vice versa. Is this a causal relationship?
• Show some examples from the Tyler Vigen website http://www.tylervigen.com/spurious-correlations . It has many examples of data connections that may be statistically valid but don’t make sense.  The site was created to point out how comparisons due to data correlation are often not valid.

Part 2: Data Science

1. What does a data scientist do? [slide 3] Show the two videos and discuss.
2.  Say: Tricks to analyzing big data:
1. Knowing what data to use, and what to disregard.
2. Dealing with non uniform data - or data enterred in a variety of formats
3. Knowing how to clean data - to make the data conform to a format without changing its meaning.
4. Knowing how to use data filtering tools to find information, recognize patterns and predict trends.
3. Look at 3 false assumptions about big data [slide 4]:
1. It’s complete and accurate
2. It tells the whole story
3. Bigger is better
4. What considerations and tradeoffs arise in the computational manipulation of data? [slide 4]
1. How do you account for missing data?
2. How do you certify your sources?
3. How do you decide which data to include and which to exclude?
4. How much data is enough? The size of a data set affects the amount of information that can be extracted from it.
5. Are your processing algorithms accurate?
5. What is some of the data needed to successfully fly a space mission? (Possible answer: Knowing all about the spacecraft: speed, direction, amount of fuel/oxygen left.) The same problems that applied to early space missions are some of the same problems faced in dealing with big data.
1. You need to decide which factors to include in your calculations, and which to exclude.
2. You need to decide when to make an assumption for missing data or when to estimate.
3. In writing a program for an early space flight there are many unknown factors using a space craft that has never flown before.
4. It’s usually impossible to create a perfect algorithm that can take into account every possibility, so how do you allow for errors and changes?
6. What are some of the calculations needed? (Possible answers: how much fuel to release and with which engines.)
• They had to run many simulations first to see what would happen under various circumstances.
7. See if anybody knows how NetFlix, movie makers, or Amazon use data about their customers to be more successful. [slide 5] http://www.smartdatacollective.com/bernardmarr/312146/big-data-how-netflix-uses-it-drive-business-success and http://www.fastcompany.com/3024655/pitch-perfect-and-how-analytics-are-transforming-movie-marketing

Businesses like Amazon and NetFlix learn the habits of different customers and make recommendations based on their previous choices and others who share similar characteristics (like Google ads).

See if anybody knows the story of Moneyball (based on a true story) of how a baseball team made decisions based on data analysis to become winners, https://en.wikipedia.org/wiki/Moneyball_(film) and how Vivek Ranadivé--who knew little about basketball but owned a multi-million dollar computer processing company and knew how to choose and analyze data--coached his then twelve-year-old daughter’s National Junior Championship basketball team to the national championship game.  He relied upon his sporting knowledge of soccer and cricket paired with his analytic mindset, to create a system of play which allowed his relatively un-athletic team to excel.  From the moment that he used intellect and his business experience to coach an inexperienced team to the championship game, the man who once thought basketball was “mindless” was hooked on the sport. http://www.newyorker.com/magazine/2009/05/11/how-david-beats-goliath

1. How is data analyzed? Data analysis requires an algorithm, a plan to collect and process data. [slide 6]
1. Generate a discussion about what data is collected and how it is analyzed. What is a possible algorithm for making a decision about choosing what movies NetFlix might suggest for a customer?
Brainstorm: what other data might they collect? (what’s currently popular in that age group, demographic, etc.)
2. Choose one of the options and write an outline of an algorithm: choosing a movie to produce or a sports player to hire. [slide 7]
1. Describe at least two calculations needed
2. Describe some of the data you’d need to collect.
3. Describe how the data sets needed could pose challenges regardless of size, such as:

• the need to clean data
• incomplete data
• invalid data
• the need to combine data sources
Share and discuss.

## Activity 2 (5 min):

Present homework from the previous day after watching TED talks on data. [slide 8]

Summarize all of the questions from the homework to be presented to the class and collect the written summaries to grade.

## Journal (5 min)

In your writing journal, map out the steps to answer a specific question or find a solution to solve a specific problem using data.

## Guidance for Practice Questions - Question Set 17

Questions in the AP Classroom Question Bank may be used for summative purposes.

Sixty of the 80 questions are restricted to teacher access.  The remaining 20 questions are from public resources.

Questions are identified by their initial phrases.

A student is recording a song on her computer. ...

An Internet service provider (ISP) is considering

Digital images are often represented by the red...

Which of the following is a true statement abou...

## Options for Differentiated Instruction

### Extension Activities:

Data analysis activities from NOAA, NASA, and more! - http://climate-expeditions.org/educators/activities.html

### Differentiation Instruction:

What is data acquisition? - http://www.ni.com/data-acquisition/what-is/

Data analysis and graphs (with Excel sample) - http://www.sciencebuddies.org/science-fair-projects/project_data_analysis.shtml

Collecting and analyzing data - http://ctb.ku.edu/en/table-of-contents/evaluate/evaluate-community-interventions/collect-analyze-data/main

Using Excel for Handling, Graphing, and Analyzing Scientific Data: A Resource for Science and Mathematics Students - http://academic.pgcc.edu/psc/Excel_booklet.pdf

## Formative Assessment

Journal day 1:

Given this fictitious data:

A blood drive at the local high school reveals that 20% of the students were HIV positive.

• What is your immediate reaction?
• What questions do you have?

Journal day 2:  Describe at least 2 ways that we create meaning out of data.

Homework: Feedback from a TED video on big data

## Summative Assessment

Students complete the Data Search and Analysis student activity.

Write an outline of an algorithm to make a data-based decision about what movie to produce or what sports team member to hire.

## Lesson Summary

Summary

Students will define and identify models and simulations. They will work in groups to propose a simulation that could be used to investigate a hypothesis.

Outcomes

• Students will identify real-world examples of models and simulations.
• Students will understand that models and simulations are used to generate new knowledge, as well as to formulate, refine, and test hypotheses.
• Students will understand that simulations allow hypotheses to be tested without the constraints of the real world.
• Students will provide examples of how simulations are used in an iterative and interactive way when processing information to allow users to gain insight and knowledge about data.
• Students will understand that the use of digital data to approximate real-world analog data is an example of abstraction.

Overview

1. Getting Started (5 min)
2. Introduction to Content (10 min)
3. Guided Activities (30 min)
1. Define and Identify Models and Simulations [10 min]
2. Use Models and Simulations to Answer Questions [20 min]
4. Wrap Up (5 min)

Source

Some of the ideas in this lesson were adapted from the CS10K community site, https://sites.google.com/site/mobilecsp/lesson-plans/realworldmodels

## CSP Objectives

• EU DAT-1 - The way a computer represents data internally is different from the way the data is interpreted and displayed for the user. Programs are used to translate data into a representation more easily understood by people.
• LO DAT-1.A - Explain how data can be represented using bits.
• EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
• LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
• EU AAP-3 - Programmers break down problems into smaller and more manageable pieces. By creating procedures and leveraging parameters, programmers generalize processes that can be reused. Procedures allow programmers to draw upon existing code that has already been tested, allowing them to write programs more quickly and with more confidence.
• LO AAP-3.F - For simulations: a. Explain how computers can be used to represent real-world phenomena or outcomes. b. Compare simulations with real-world contexts.

## Math Common Core Practice:

• MP4: Model with mathematics.

## Common Core ELA:

• RST 12.7 - Integrate and evaluate multiple sources of information presented in diverse formats and media
• RST 12.8 - Evaluate the hypotheses, data, analysis, and conclusions in a science or technical text

## NGSS Practices:

• 2. Developing and using models

## Key Concepts

• Models and simulations are used to generate new knowledge, as well as to formulate, refine, and test hypotheses.
• Simulations allow hypotheses to be tested without the constraints of the real world.
• The use of digital data to approximate real-world analog data is an example of abstraction.

## Essential Questions

• How can computational models and simulations help generate new understanding and knowledge?

## Teacher Resources

Student computer usage for this lesson is: optional

These videos supplement the material covered in this lesson:

# Getting Started (5 min)

• Journal: If I flip a coin 10 times, is it possible to predict exactly how many times it will come up heads? Why or why not?
• A weather forecaster presented a forecast with a 20% chance of precipitation the next day.  The next day it rained.  Explain how the forecast may still have been correct.

# Introduction to Content (10 min)

Introduce Vocabulary

Choose one of the simulations at Phet simulations and answer the following.

• What models are being used?
• What details are included?
• What details are omitted?
• What does the simulation seem to show?
• How does the simulation repeat and interact when processing information to allow users to gain insight and knowledge about data.

View these two videos

Bill Nye and a scaled model of the solar system (4:17)

Computer Generated Model of a Solar System (2:41)

# Guided Activities (30 min)

Students create a journal entry responding to these two questions:

1. What was a main idea presented by each video?
2. What aspect(s) of the models helped make that point?

Students discuss each of the following with elbow partners then groups.

1. How do the models in these videos depend on computing?
2. Consider the strengths and weaknesses of each model. What understanding can be better drawn from the first model and what understanding can better be drawn from the second?
3. What questions could be answered using these two models?
4. How does the use of digital data approximate real-world analog data? (point out that this is another example of abstraction.)

From each group students share at least one response to each prompt.

## Define and Identify Models and Simulations [10 min]

Examples of models (do not need to show the entire videos for student understanding):

• Watch this video of a human heart simulation: Multi-scale Multi-physics Heart Simulator UT-Heart (5:15) (watch up to 2:00; the rest is interesting but not necessary).
• What’s an advantage to having so many data points? What about a disadvantage? (A supercomputer is necessary to run the simulation. Again, point out that this is an abstraction because the digital data is representing what is in the real world)
• How can you test a parachute to be used on Mars? https://www.youtube.com/watch?v=_jOzxEOlDJg (1:11)? Describe the physical test. Before that test, they create models and simulate on the computer - why? (It is very costly to run a test and to create an actual parachute. First be sure an idea passes a simulated test, then build it.)

Examples of Simulations:

Have students find and share simulations in each of the following:

• Financial (e.g., stock market forecasting)
• Weather (e.g., predicting the path of hurricanes)
• Space (e.g., predicting the path of an asteroid)
• Sports (e.g., predicting championships)

## Use Models and Simulations to Refine Questions [20 min]

• Select one of the simulations explored today.
• Write a question the simulation could help answer.
• Run the simulation and write an answer to your question.
• Exchange your results with your elbow partner.
• Refine your elbow partners question
• Write an answer to the new question.

# Journal: Have students record the definitions (in their own words) of the vocabulary used in this lesson: probability, model, simulation, and hypothesis.

## Formative Assessment

• Can students define models and simulations in their own words (and understand the difference)?
• During the activity, are students able to identify particular characteristics that will be included in a model and simulation as well as characteristics that are to be excluded?

## Lesson Summary

Summary

Students will formulate a hypothesis, run simulations, and analyze the results to determine what needs to be modified in their hypothesis and/or the simulation itself.

Outcomes

• Students will identify and create real-world examples of models and simulations.
• Students will be introduced to using random numbers both in Python and the exam reference sheet.
• Students will use models and simulations to generate new knowledge and to formulate, refine, and test hypotheses.
• Students will use simulations to test hypotheses without the constraints of the real world.

Overview

1. Getting Started (10 min)
2. Guided Activities (35 min)
1. Activity 1 - Rolling Dice Simulation, materials - large paper and markers or other method to share group work [15 minutes]
2. Activity 2 - Using a Simulation to Test a Hypothesis [20 minutes]
3. Wrap Up (5 min)

Source

The coin flipping extension is based on a CS10K lesson: https://sites.google.com/site/mobilecsp/lesson-plans/lp-coinflip-miniprojects

## CSP Objectives

• EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
• LO DAT-2.D - Extract information from data using a program.
• LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
• EU AAP-1 - To find specific solutions to generalizable problems, programmers represent and organize data in multiple ways.
• LO AAP-1.A - Represent a value with a variable.
• LO AAP-1.B - Determine the value of a variable as a result of an assignment.
• EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
• LO AAP-2.H - For selection: a. Write conditional statements. b. Determine the result of conditional statements.
• EU AAP-3 - Programmers break down problems into smaller and more manageable pieces. By creating procedures and leveraging parameters, programmers generalize processes that can be reused. Procedures allow programmers to draw upon existing code that has already been tested, allowing them to write programs more quickly and with more confidence.
• LO AAP-3.A - For procedure calls: a. Write statements to call procedures. b. Determine the result or effect of a procedure call.
• LO AAP-3.E - For generating random values: a. Write expressions to generate possible values. b. Evaluate expressions to determine the possible results.
• LO AAP-3.F - For simulations: a. Explain how computers can be used to represent real-world phenomena or outcomes. b. Compare simulations with real-world contexts.

## Math Common Core Practice:

• MP2: Reason abstractly and quantitatively.
• MP3: Construct viable arguments and critique the reasoning of others.

## Common Core Math:

• F-BF.1-2: Build a function that models a relationship between two quantities
• S-ID.1-4: Summarize, represent, and interpret data on a single count or measurement variable

## Common Core ELA:

• RST 12.7 - Integrate and evaluate multiple sources of information presented in diverse formats and media
• RST 12.8 - Evaluate the hypotheses, data, analysis, and conclusions in a science or technical text
• RST 12.9 - Synthesize information from a range of sources

## NGSS Practices:

• 3. Planning and carrying out investigations
• 4. Analyzing and interpreting data
• 5. Using mathematics and computational thinking
• 6. Constructing explanations (for science) and designing solutions (engineering)
• 8. Obtaining, evaluation, and communicating information

## Key Concepts

Students will be able to:

• identify and create real-world examples of models and simulations.
• use models and simulations to generate new knowledge and to formulate, refine, and test hypotheses.
• use simulations to test hypotheses without the constraints of the real world.

## Essential Questions

• How can computational models and simulations help generate new understanding and knowledge?
• How can computation be employed to help people process data and information to gain insight and knowledge?
• How are algorithms implemented and executed on computers and computational devices?
• How are programs developed to help people, organizations or society solve problems?
• How are programs used for creative expression, to satisfy personal curiosity or to create new knowledge?
• How do computer programs implement algorithms?
• How do people develop and test computer programs?
• Which mathematical and logical concepts are fundamental to computer programming?

## Teacher Resources

Student computer usage for this lesson is: required

The PowerPoint "Using Data and Simulations" can be found in the Lesson Resources folder.

Penny Bias article to go with lesson extension: http://mathtourist.blogspot.com/2011/02/penny-bias.html

For the Monty Hall Problem extension:

Online simulation of the problem: http://math.ucsd.edu/~crypto/cgi-bin/MontyKnows/monty2?1+17427

There are several videos on YouTube demonstrating and explaining the Monty Hall Problem.

An animated video: https://www.youtube.com/watch?v=mhlc7peGlGg length is 5:48

Live action video: https://www.youtube.com/watch?v=4Lb-6rxZxx0 length is 5:30

There is sample code for the die Python program in the Lesson Resources Folder called 4-3 Sample Code.py

# Getting Started (10 min)

Journal:

• Models and simulations are simplified representations of more complex objects or phenomena.
• What is a complex, everyday phenomena that is studied using models and simulations?
• Models often omit unnecessary features of the objects or phenomena that are being modeled.
• What features might a flight simulator program leave out to make the program easier to use and run faster?
• Simulations mimic real-world events without the cost or danger of building and testing the phenomena in the real world.
• What is something random that could be tested more easily with a computer than by building the real thing?

Have students share their answers with the class.

# Guided Activities (35 min)

## Activity 1 (15 min) - Designing a Simulation (rolling dice)

1. Explain that many simulations include randomized events to simulate reality. Using random number generation in a program means each execution may produce a different result and that allows the program to simulate a large number of seeming random events.
• What unnessecary details can be omitted if you need to create a random outcome?
• [Possible answers: you only need to choose a random number and then use selection to choose which block of code to execute as a result]
2. Arrange the class into small groups (two or three).
3. The groups are to develop an algorithm and write a program to simulate rolling a standard six-sided die.
• The program should ask for the number of rolls and display the results (number of each possible outcome that was rolled).
• Note: The students will need to use the randint function to produce a series of "random" differering values.
• `import random` at the top of the code
• `random.randint(min,max)` returns a 'random' integer between the min and max values (inclusive).
• The exam reference sheet includes a RANDOM(a, b) procedure which returns a random integer between a and b, inclusive.
• Remind students to use functions where appropriate.
• Students are encouraged to write their plan (flow chart or pseudocode) on large paper or some other method to enable later sharing with the class.
4. Have a representative from each group describe their plan for the algorithm. (After the first group, subsequent groups could highlight similarities and differences.)
5. Next, have each group discuss and implement changes to their algorithm and programs to simulate rolling two dice. (When providing directions, ask the class how using functions in the previous exercise would make this change easier to implement.)
6. As a class, discuss how many times the program should "roll" the dice. (Be sure students note that every possible value should be displayed, especially upper and lower bounds.)

## Activity 2 (20 min) - Using a Simulation to Test a Hypothesis

1. Each group should develop a hypothesis to answer the question: Can a twelve-sided die be used in place of rolling two six-sided dice? Each group should agree on a hypothesis and write it down.
2. The groups already have a design for rolling two six-sided dice. They should now design an algorithm for rolling a twelve-sided die. (This process can involve creating a new function or modifying the current function's parameters to accept the number of sides.)
3. The students should be paired up and begin writing the code necessary to test the hypothesis.
4. Once they finish the code, students should run both simulations to compare results. Note: Students need to save their code. They will be using it in the next lesson's homework.
5. As students complete their simulations, if possible, have each pair share their data with the class (in a spreadsheet displayed by the teacher's computer, or on the white board, or large paper).
6. As data is collected, students should determine whether their hypothesis was correct or whether it needs to be modified.
7. Analyze with the class if the simulations themselves were implemented correctly. (Does the data represent the expected theoretical outcomes? How can you modify the program to correct any errors?)

## Wrap Up (5 min)

Ask the class: What are the advantages/disadvantages of using a program vs. actual dice?

How quickly can the computer generate thousands of test cases?

Can the computer be used to analyze the test cases as well as to generate the random numbers?

Journal: Summarize how a program can be used as a simulation to test a hypothesis.

## Guidance for Practice Questions - Question Set 18

Questions in the AP Classroom Question Bank may be used for summative purposes.

Sixty of the 80 questions are restricted to teacher access.  The remaining 20 questions are from public resources.

Questions are identified by their initial phrases.

A car manufacturer uses simulation software dur...

The algorithm below is used to simulate the res...

## Options for Differentiated Instruction

Students can be provided with the code for a function to simulate rolling one die and use it to develop the rest of the program.

After the first group activity, the teacher can swap a student from each group to allow different input into the next group activity.

### Extension #1 - Penny Flipping

This extension is based on advanced mini project # 4, which can be found here:  https://docs.google.com/a/smcps.org/document/d/1AKHpiQ87bE4W1YzHlAFh2uNAHuEtdMOCQVV6HfxfDzc/edit

Students read an article about the 'randomness' of flipping a penny: http://mathtourist.blogspot.com/2011/02/penny-bias.html

Next, students should hypothesize the results of lining up 10 pennies on edge and knocking them over (as described in the article). Students need to determine how many times to run the experiment, collect data, and analyze the results.

Students should work in pairs to write a computer simulation for the penny experiment. (Note: this is a program based on experimental data, not theoretical.)

Discuss as a class the validity of the simulation written. Can this simulation be used for other coins?

### Extension - The Monty Hall Problem

In the game show "Let's Make a Deal", the original host was Monty Hall. Onvery show, Monty would present a player with three doors or curtains to choose from. The contestant was asked to choose a door in search of a prize. After making a selection, Monty Hall would open one of the doors not selected by the contestant to reveal a non-prize (perhaps a goat). Then Monty would ask if the contestant wanted to change their choice.

After explaining the show to the class ask, "Should the contestant change?" Students should propose a hypothesis.

Have the students design a simulation to test their hypothesis (discuss what is the data collected and the number of times the simulation should run to collect data). After running the simulation, students should evaluate their hypothesis and determine whether it needs to be modified or whether the simulation needs to be modified.

If Monty Hall had four doors, what should the contestant do?

What should the contestant do if they know that Monty does not know what is behind each door?

Online simulation of the problem: http://www.math.ucsd.edu/~crypto/Monty/monty.html

There are several videos on YouTube demonstrating and explaining the Monty Hall Problem.

An animated video: https://www.youtube.com/watch?v=mhlc7peGlGg length is 5:48

Live action video: https://www.youtube.com/watch?v=4Lb-6rxZxx0 length is 5:30

## Formative Assessment

Review student journal entries and class discussions to determine students' understanding of simulations, a hypothesis, and the ability to determine a method to test a hypothesis.

## Summative Assessment

Describe an algorithm to simulate drawing an ace of any suit from a standard deck of cards.

Make a hypothesis about drawing cards from a standard deck of cards and determine how to collect data to answer your hypothesis.

## Lesson Summary

Summary

This lesson introduces students to reading information from an input file and writing to an output file as a functionality of Python programming as an example of using a data source to gain insight. The students will then apply these concepts to program a simple Dice Roll application to generate data. This lesson will prepare students to read and write files for use in later Data Acquisition lessons.

## CSP Objectives

• EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
• LO CRD-2.B - Explain how a program or code segment functions.
• LO CRD-2.C - Identify input(s) to a program.
• EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
• LO DAT-2.D - Extract information from data using a program.
• LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
• EU AAP-1 - To find specific solutions to generalizable problems, programmers represent and organize data in multiple ways.
• LO AAP-1.A - Represent a value with a variable.
• LO AAP-1.B - Determine the value of a variable as a result of an assignment.
• EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
• LO AAP-2.B - Represent a step-by-step algorithmic process using sequential code statements.
• LO AAP-2.H - For selection: a. Write conditional statements. b. Determine the result of conditional statements.
• LO AAP-2.K - For iteration: a. Write iteration statements. b. Determine the result or side-effect of iteration statements.

## Key Concepts

Outcomes

• Students will learn how to read from an input data file and write to an output file using the Python programming language.
• Students will consider how combining data sources, clustering data, and classifying data are parts of the process of using programs to gain insight and knowledge from data.
• Students will be able to apply their new knowledge to writing independent programs.

## Essential Questions

• How can computation be employed to help people process data and information to gain insight and knowledge?
• How do people develop and test computer programs?

## Teacher Resources

Student computer usage for this lesson is: required

Python for Everybody by Charles Severance, http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf.

Explanation of the CountIf function in Excel https://support.office.com/en-us/article/COUNTIF-function-e0de10c6-f885-4e71-abb4-1f464816df34.

The mbox.txt and mbox-short.txt files are in the Lesson Resources Folder.

# Getting Started (10 min)

Think-Pair-Share

• List some pros and cons of inputting data for a program using only a keyboard.
• List some pros and cons of displaying output of a program using only video display.

Have students review their journal entries as a class and note the advantages and disadvantages on a white board.

# Guided Activity (40 min)

## Before Starting The Activity:

• Have the students open the book Python for Everybody by Charles Severance http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf and navigate to Chapter 7 "Files" (page 79).
• Students should also open PyCharm or the Runestone coding environment and create a new .py file. (Teachers may want to project PyCharm and the textbook on the board.)
• The two files mbox.txt and mbox-short.txt are located in the Lesson Resources Folder. Students will need to have these saved in the same folder as their python file for the lesson.

## Chapter 7: Files

The students should code the examples in the book as the teacher proceeds through the lessons.

1. Section 7.1 Persistence
1.  Emphasize the disadvantages of the loss of information when the computer is powered off and the frequent need to store data in a more permanent location.
2. Introduce the concept of secondary memory.
2. Section 7.2 Opening Files
1. After reading the brief text in Section 7.2, have students type the sample code to open the mbox.txt file and run it.
2. Have students try to open another nonexistent file and see what kind of error they get.
3. Check the student’s code for understanding:
• Were the students able to open the mbox.txt file successfully?
• Did the students get a “No such file or directory” error message when they attempted to open a nonexistent file?
3. Section 7.3 Text Files and Lines
1. Introduce the newline special character “end of line” which breaks the file into lines. In Python the newline is represented by “\n” in string constants.
2. Have the students type in the sample code in 7.3 (page 81), which demonstrates the newline character’s function.
3. Check the student’s code for understanding
• Did the “\n” cause the string to be displayed on two lines?
4. Section 7.4 Reading Files
1. Introduce the notion of reading each line in a file by using a while loop. (The students should already be familiar with for and while loops, but may need a quick review of the syntax.)
2. Have students count the lines in the mbox.txt file by running the first sample code in 7.4 (page 82).
3. Check the students’ code for understanding.
• Does your code give the same output as in the book?
5. Section 7.8 Writing Files
1. Show the students how to open a file for writing by opening it with mode 'w' as a second parameter.
2. After reading the brief text in Section 7.8, have students type the sample code to write to the output.txt file and run it. Remind them to use the newline character, “\n”, at the end of each line.
3. Check the student’s code for understanding:
• Were the students able to write to the output.txt file successfully?
• Did the students remember to close their output file when finished?

# Python Lab (50 min)

### Discussion:

Discuss what possible data sources might be used as input to programs. Brainstorm ways that combining data sources, clustering data, and classifying data are parts of the process of using programs to gain insight and knowledge from data. Point out to students that data doesn't always fall from trees in exactly the quantity, quality, and format needed for your program. A programmer needs to be a critical consumer of data and know when to use multiple sources, clean the data, or sort through to find the right breadth and variety of data needed.

## Part 1:

• Open the Python program(s) for rolling two six sided and one twelve-sided dice (students should have saved this from lesson 4-3).
• Edit the code so it rolls the dice 1000 times and write the results of each roll to an output file named diceRolls.dat.

## Part 2:

• Import the diceRolls.dat file into your spreadsheet software (such as Excel) either as text or cvs file type.
• Make use of the `countif` function to compare the distribution of the rolls for how many times each number 2 through 12 was rolled with the pair of six-sided dice to the distribution for the 12-sided die.
• Show the comparison visually with an appropriate chart in your spreadsheet software.

## Options for Differentiated Instruction

Have students work in pairs as the new concepts are introduced and practiced.

For a class needing more scaffolding: Work as a group. Have students take turns around the room to read aloud the brief text in each section in Chapter 7. Do the short exercises together with a "row captain" assigned to each row (or group) in the classroom who is in charge of checking that everybody in their row has completed each short task and has gotten the help needed to finish. Row captains help each other until the entire class has successfully completed each task. Report out on what challenges were encountered, recording problems and solutions at the front of the classroom as the class works. Rotate the role of row captain for each section.

For more independent students: Introduce/demonstrate the key ideas first and then allow student to work through Chapter 7 at their own pace.

## Formative Assessment

The teacher will check the student’s code for understanding.

The teacher will check for understanding as each new concept is introduced.

## Summative Assessment

Exercise 7.1 Write a program to read through a file and print the contents of the file (line by line) all in upper case. Executing the program will look as follows:

python shout.py

Enter a file name: mbox-short.txt

FROM STEPHEN.MARQUARD@UCT.AC.ZA SAT JAN 5 09:14:16 2008

RETURN-PATH: <POSTMASTER@COLLAB.SAKAIPROJECT.ORG>

RECEIVED: FROM MURDER (MAIL.UMICH.EDU [141.211.14.90])

BY FRANKENSTEIN.MAIL.UMICH.EDU (CYRUS V2.3.8) WITH LMTPA;

SAT, 05 JAN 2008 09:14:16 -0500

You can download the sample input file from https://www.py4e.com/code3/mbox-short.txt

Exercise 7.2 Write a program to prompt for a file name, and then read through the file and look for lines of the form:

X-DSPAM-Confidence: 0.8475

When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating point number on the line. Count these lines and the compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.

Enter the file name: mbox.txt

Average spam confidence: 0.894128046745

Enter the file name: mbox-short.txt

Average spam confidence: 0.750718518519

Test your file on the mbox.txt and mbox-short.txt files.

## Lesson Summary

Pre-lesson Preparation

Your students will need computers for this lesson. If you would like to show students a working dartboard simulation (with a circular dartboard), check that your browser can run a Java plug-in. Be sure to update, activate, and disable the plug-in as needed for security purposes.

Summary

In this lesson, students will explore basic data analysis concepts in Python, learn about code extensibility, create a simple simulation from scratch, and reuse their code to make a more elaborate simulation.

Outcomes

• Students will create a mathematical simulation and understand how programming can be used to model real-world processes.
• Students will understand extensibility and code reuse by developing a simulation and modifying it to solve a more complex task.
• Students will be able to reason about and solve a problem by programming a solution from scratch.
• Students will collect and analyze data.
• Students will understand that analog data can be closely approximated digitally using a sampling technique, which means measuring values of the analog signal at regular intervals called samples and that the samples are measured to figure out the exact bits required to store each sample.

Overview

Session 1:

1. Getting Started (5 min)
2. Guided Activity (45 min)
3. Optional Homework

Session 2:

1. Getting Started (5 min)
2. Guided Activity (45 min)
3. Homework

Part of this lesson was adapted from http://www.nzmaths.co.nz/resource/dartboards and http://www.nzmaths.co.nz/resource/more-dartboards.

## CSP Objectives

• EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
• LO CRD-2.E - Develop a program using a development process.
• LO CRD-2.F - Design a program and its user interface.
• EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
• LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
• EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
• LO AAP-2.B - Represent a step-by-step algorithmic process using sequential code statements.
• LO AAP-2.C - Evaluate expressions that use arithmetic operators.
• LO AAP-2.G - Express an algorithm that uses selection without using a programming language.
• LO AAP-2.H - For selection: a. Write conditional statements. b. Determine the result of conditional statements.
• LO AAP-2.J - Express an algorithm that uses iteration without using a programming language.
• LO AAP-2.K - For iteration: a. Write iteration statements. b. Determine the result or side-effect of iteration statements.
• EU AAP-3 - Programmers break down problems into smaller and more manageable pieces. By creating procedures and leveraging parameters, programmers generalize processes that can be reused. Procedures allow programmers to draw upon existing code that has already been tested, allowing them to write programs more quickly and with more confidence.
• LO AAP-3.A - For procedure calls: a. Write statements to call procedures. b. Determine the result or effect of a procedure call.
• LO AAP-3.F - For simulations: a. Explain how computers can be used to represent real-world phenomena or outcomes. b. Compare simulations with real-world contexts.

## Key Concepts

The development of a program from scratch to solve a specific problem is presented to students by creating a simulation that lets them see how software can model a real-world process. Additionally, the concepts of extensibility and code reuse are shown through hands-on programming experience.

Some real-world problems have digital input data, many have analog (sound, light, pictures, motion, temperature, etc.) which needs to be considered when determining how to gather input and what to use as input.

## Essential Questions

• How are algorithms implemented and executed on computers and computational devices?
• How are programs used for creative expression, to satisfy personal curiosity or to create new knowledge?
• How do computer programs implement algorithms?

## Teacher Resources

Student computer usage for this lesson is: required

The Lesson Resources folder contains an example program showing how to use Python's random function to simulate tossing a coin, as well as a Dartboard.py solution. There is also a comparison between this python code, and some pseudocode that one may write before coding Dartboard.py, titled "Pseudocode vs Python."

An alternative lesson outline using Runestone and PyCharm to code a simulation and use it to develop, refine and test hypotheses in is the lesson folder. The lesson is in a file named "Monte Carlo Simulation to Calculate Pi.docx".

# Getting Started (5 min)

Think-Pair-Share: Writing programs "from scratch"

• Ask your students to think about what it means to write a program "from scratch," how and why someone would do so.
• Have your students pair off to share and list their ideas.
• Collect and discuss their responses. Some topics to note could include: the importance of programming to solve a specific problem, the challenge of starting a program without copying someone else's code, and the value of learning to implement algorithms on one's own. The dartboard simulation used in this lesson provides a good example of a straightforward problem with a solution that someone can code on their own from scratch.

# Guided Activity (45 min)

## A Dartboard Simulation [10 min]

Get students' attention by asking them to play with the Dartboard Simulator (requires the Java browser plug-in) as they think about the following scenario and answer the related questions:

• You have a circular dartboard consisting of three concentric circles. The largest outer circle has a radius of 3 units and is colored yellow. Inside that is the middle circle with a 2-unit radius, colored orange. In the center is a red circle, the bull's-eye, with a 1-unit radius. The final board looks like a red circle enclosed by an orange ring, enclosed by a larger yellow ring.
• Now suppose you have a robot that throws darts randomly. How many darts would you expect to hit the bull's-eye if the robot throws 90 darts at your dartboard, assuming every dart hits the board? How many of those darts (on average) would hit the orange part of the board? How many would hit the yellow part?
• To make the dartboard into a game, a certain number of points is awarded each time a dart hits a specific color. You want to assign score points to the dartboard fairly (for our random robot), such that the robot gets a score proportional to the chance it has of hitting a given color on the board. How many points should landing a dart on red be worth? On orange? On yellow?
• Question: What kind of a variable is needed to store the input for this program? Is the input to this program digital or analog? How do you know?
• Recall that the formula for the area of a circle is A = pi*radius2. The red circle has an area of pi units. The orange ring has an area of 3*pi units (4*pi minus the area of the red circle). The yellow ring has an area of 5*pi units (9*pi minus 3*pi for orange and pi for red). The total area is 9*pi. Thus, on average, 50 of the darts would land on yellow, 30 of the darts would land on orange, and 10 darts would land on red.
• For score assignment, since a single dart would hit yellow 5/9 of the time, orange 3/9 of the time, and red 1/9 of the time, we know that red is 5 times harder to hit than yellow and 3 times harder to hit than orange. Thus, a convenient assignment would be 15 points for red, 5 points for orange, and 3 points for yellow. With these score assignments, if the robot throws a dart it would score 5 points on average.
• This data is digital and can be represented with an integer variable. It only represents the final location of the dart. It doesn't supply any information about how it was thrown or other useful information that might be gathered while play a real dart game. This program ignores the details of the location and reduces the location to a single color area on the board. If exact position of the dart as it moved over time was desired, then the analog data of location could be closely approximated digitally using a sampling technique, which means measuring values of the analog signal at regular intervals called samples. The samples are measured to figure out the exact bits required to store each sample.

## Devising a Dartboard Simulation [10 min]

Suppose you want to write a program that simulates tossing virtual darts. Each dart will land at a point on a square virtual dartboard that is one unit long on each side. Each point on this dartboard has both an x and a y coordinate, both of which are between 0 and 1. The bull's-eye is a square in the center of the dartboard with sides of length 0.5 units.

Have the program ask the user how many darts they want thrown. The program should then simulate throwing these darts by generating a random landing location (a random x and a random y coordinate) on the dartboard for each dart. Recall how to use Python's random functions (by reviewing the previous lessons' dice simulation). As darts are thrown, the program counts how many darts land within the center square, the bull's-eye. The bounds of bull's-eye are [0.25, 0.75] on both the x and y axes. Finally, the program should print out the number of darts thrown and the number that landed within that rectangle.

Have your students answer the following questions:

1. Is the problem description clear enough that you could code this program on your own? If so, what information helps to make it clear? If not, what further explanations are needed?
2. What are some variables you think your program will need, and what will you use them for?
3. Are there any built-in Python functions that might be necessary or useful? If so, which ones?
4. What other types of Python constructs are required for this program? Specifically, will you need an input assignment? A loop? An if statement? A print statement? What else?
5. If you do need to use a loop, would it be better to use a while loop or a for loop in this program? Or will either work equally well? Why? If you think you can code the program without using a loop, explain why.
6. Think back to the dice simulation you made. What are two or three ways in which your dartboard program will be similar to the dice simulation? What are two or three ways they will be different?
7. What percentage of darts thrown randomly at the dartboard will hit the bull's-eye?

## Programming the Simulation [25 min]

Take the rest of the class time to have your students begin programming their simulation. If they are not able to finish before the session ends, you may want to assign the program as homework, or devote the beginning of the second session to finishing the program. They will need their programs for the work in the next session.

You may want to remind students how to use Python's random function. The following code may be a useful example:

Example random coin flipping code (the python file is available in the Lesson Resources folder):

` import random # Needed for random number generation`

`number_of_heads = 0`

`for i in range(0, 100):`

`x = random.random() # Generates a random floating point (decimal) number between 0 and 1 `

`if x > 0.5:`

`number_of_heads = number_of_heads + 1`

`print "The number of heads in 100 coin flips is ", number_of_heads `

## Optional Homework

Have your students finish their dartboard programs, as they are needed in the next session. Alternatively, if they have finished their programs, you could assign the "Collecting and Analyzing Data" think-pair-share of the next session as a homework, to be discussed in the next session.

# Getting Started (5 min)

Journal: Making your programs extensible

• Define "extensibility" for your students in the context of a program. Focus on the notion that we should plan for the future by keeping our code simple and organized logically. By doing so, we can more easily extend our code (add new features), reuse parts of our code in other programs, and more swiftly modify our program if the requirements change.
• Have your students answer the following questions in their journals: "What are some reasons to make our programs extensible? What are some things that can go wrong if our code is not extensible?"

# Guided Activity (45 min)

## Collecting and Analyzing Data [10 min]

Think-pair-share: Collecting and analyzing data

1. Ask your students to recall what percentage of darts randomly thrown will hit the bull's-eye.
2. With the completed square dartboard program, have your students collect data by running their program entering increasing values for the number of darts to be thrown (1, 10, 100, 500, 1000, 5000, 10000, 50000, and so on), recording the output for each input (number of darts that hit the bull's-eye). For each pair of input and output, have them calculate the percentage that hit the bull's-eye (divide output by input, e.g., 12 hits / 50 thrown = 0.24, or 24% of the darts were hits).
3. Have your students pair off and compare their data with that of their partner. Ask them to describe why the trends they observe in the data are there. Have them recall when they calculated the probability of the robot throwing darts at a circular dartboard.
4. Gather your students and discuss as a class what can be observed from the data. Ideally, as you increase the sample size (number of darts thrown), you should be getting a more accurate measurement of how many darts hit the bull's-eye. As with the dart-throwing robot, the area of the center relative to the whole dartboard determines the percentage that hit. In this case, the center square has an area of 0.25 units2, and the rest of the dartboard has an area of 0.75 units2, so approximately 25% of the randomly thrown darts will hit the bull's-eye.
5. Expand the discussion to consider whether or not this is a realistic or appropriate model for a real life situation. What factors would make it more realistic? (changing levels depending on the skill of the thrower?, using data from a real life dart game to determine precentages?, entering values for angle and force to determine the path of a dart?, have students brainstorm). Point out that the suitability of the simulation depends on how it will be used.

## Changing Requirements [30 min]

Discuss with your students how we often want to reuse our code for a new project, and how it is not uncommon when developing a program for the requirements to change. Both of these changes benefit from extensibility in code. Your students will get to test the extensibility of their dartboard program by reusing what they have to fit with a new objective: make a three-ringed circular dartboard. Point out that an iterative design process often starts with a simpler problem and then generalizes or extends that program to apply to other situations.

As before, this program should first ask a user how many darts they would like to throw. Then, it should use that input to simulate throwing darts at a circular dartboard. Finally, it should print the number of darts thrown, the number of darts that hit the bull's-eye, the number that hit the middle ring, the number that hit the outer ring, and the number that missed completely. This circular dartboard is similar to the one from the previous session's robot exercise: it has a central circular bull's-eye surrounded by a middle ring, which is in turn surrounded by an outer ring. The coordinates for this dartboard are: the center is at coordinate (0,0); the outermost ring is a circle with radius of 3; the middle has a radius of 2; and the bull's-eye has a radius of 1. Simulate throwing a dart by picking random x and y coordinates, each between -3 and 3. Since this range is a square, some darts may miss the dartboard completely. For this program, students may reuse as much of their square dartboard code as they need, but make sure to preserve their original program separately.

## Guidance for Practice Questions - Question Set 19

Questions in the AP Classroom Question Bank may be used for summative purposes.

Sixty of the 80 questions are restricted to teacher access.  The remaining 20 questions are from public resources.

Questions are identified by their initial phrases.

A cable television company stores information a...

A large data set contains information about all...

Biologists often attach tracking collars to wil...

## Homework

Have your students finish their circular dartboard programs and answer the following questions:

1. How extensible was your square dartboard simulation? How much code were you able to reuse for your circular dartboard simulation, and what did you need to write from scratch?
2. What are the major differences between the code in the programs and the way you approached coding them? Which did you find more difficult to program? Would it have been easier or faster to code the circular dartboard program from scratch?
3. Collect data by running the circular dartboard program with various increasing inputs (1, 10, 100, 500, 1000, 5000, and so on) and record the output. For each input and corresponding output, calculate the percentage of darts that missed, that hit the outer ring, that hit the inner ring, and that hit the bull's-eye. What trends do you see in the data? How is the behavior similar and different to the square dartboard program?

## Options for Differentiated Instruction

Note: graphics.py may be used with this lesson to create a visualization - http://mcsp.wartburg.edu/zelle/python/ppics2/code/graphics.py

Another Note: In order to run graphics.py, you may possibly need to save files in the same Python folder as the program being written due to student access and student space restrictions in your network, (i.e. if the link cannot point back to a network drive file due to restrictions).

## Formative Assessment

The students will produce two simulation programs on their own: the square dartboard simulation and the circular dartboard simulation.

## Summative Assessment

The students will record their understanding of extensibility in their journal.

The students will gain experience collecting data in both sessions.

The students will think analytically about their programs by answering the questions in Session 1 and in the homework for Session 2.

## Lesson Summary

This is the third of three lessons where students will research a computing innovation.

This lesson will focus on:

Identification of the  data used by a computing innovation

Explain how the data is consumed, produced, or transformed.

## CSP Objectives

• EU CRD-1 - Incorporating multiple perspectives through collaboration improves computing innovations as they are developed.
• LO CRD-1.A - Explain how computing innovations are improved through collaboration.
• EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
• LO CRD-2.C - Identify input(s) to a program.
• EU IOC-1 - While computing innovations are typically designed to achieve a specific purpose, they may have unintended consequences.
• LO IOC-1.A - Explain how an effect of a computing innovation can be both beneficial and harmful.
• LO IOC-1.B - Explain how a computing innovation can have an impact beyond its intended purpose.

## Key Concepts

Students must understand  how to identify data used in a computing innovation and explain how the data is consumed, produced, or transformed.

# Session 1

## Introduction

Today students will select and research a computing innovation in preparation for a written report about the innovation.

This lesson will focus on:

Identify the data used in a computing innovation

Explain how the data is consumed, produced, or transformed.

## Activity 1 - Selection of a computing innovation

Say: To complete this project students will have to find a somewhat detailed explanation of how the innovation works or more specifically how it consumes, produces or transforms the data it uses.  In this activity, students are to iterative select and research computing innovations until they identify one that is of interest to them and that they can obtain a description of the data it uses and the way in which the data is consumed, produced, or transformed.

When students have identified a topic with adequate references available they are to identify the topic for you, describe the data the computing innovation uses and provide at least one reference for how the data is used.

## Activity 2 - Preparation of the report

Working in pairs, students are to propose and do preliminary research for a report:

1. describing the innovation and its creators
2. identifying the information the innovation uses.
3. describing the component data used to store the information.
4. explaining how the innovation consumes or processes the data.
5. containing a reference section with at least one citation in each of parts 1 - 4.

Students are to

Describe the computing innovation they are researching

Describe the data the computing innovation uses both in the sense of the information the innovation needs and the data the comprises that information.

Prepare an outline of their report.

When students finish their descriptions and their outline - and submit them for initial review - they should continue their research and begin individually writing their reports.

## Session 2

### Activity 1

Reports should be returned to students in pairs.

Teacher feedback is focused on the computing innovation and data the computing innovation is using.

Students read and assess the feedback, and as needed ask questions about the responses.

Activity 2

Before resuming their reports ask students to share any resources they found useful in identifying how their innovation uses data.

Students resume researching and writing their reports.

Wrap up

When the report is finished it should individually be submitted for assessment.

Think-pair-share:

Ask students to respond to these two questions.

Do you think the use of the computing innovation you researched differs from the way its developers originally intended?

Is it possible for the developers of a computing innovation to think of every possible future use of the innovation they created?

How can the rapid sharing of a program or running a program with a large number of users result in significant impacts beyond the intended purpose of the programmer?

## Formative Assessment

Written projects in preliminary form. Verify that students are describing the innovatinand have identified the nformation the innovation is using.

## Summative Assessment

Written projects in final form. Check that students have explained how the data the information is processed. They should explain what te computer in the computing innovation is doing with the data they identified.

## Lesson Summary

Summary

This lesson teaches students to use simulations to develop, refine and test hypotheses. NetLogo, which is used throughout the lesson to illustrate the use of functional and data abstraction, is a programmable modeling environment for simulating natural and social phenomena.

NetLogo is a variation of the Logo language instead of Python, so students are not expected to write new code in this lesson.  See http://www.ianbicking.org/docs/PyLogo_lightning.html for a comparison of Logo and Python.

Outcomes

• Students will understand that models are abstractions of real environments and will recognize the rationale for, and limitations of, modeling techniques to analyze problems.

• Students will be able to compare abstractions in one programming language to another (Python vs NetLogo)
• Students will recognize the use of functional and data abstractions in modeling.

• Students will be able to develop and test hypotheses using an experimental approach in a modeling framework.

Overview

Session 1 - Modeling in NetLogo

• Getting Started (8 min)
• Learning NetLogo (40 min)
• Wrap Up (2 min)

Session 2 - Models and Hypothesis Design

• Getting Started (5 min)
• Models and Hypotheses (20 min)
• Model Selection and Hypothesis Development (25 min)

Session 3 - Hypothesis Testing

• Getting Started (5 min)
• Hypothesis Testing (40 min)
• Wrap Up (5 min)

Note: This lesson introduces another programming tool and environment: NetLogo. Teachers may choose to complete only the first session (on the basics of NetLogo), to expose students to a new computational platform and way of thinking, to extend the ideas in Unit 4 about modeling and simulation.

## CSP Objectives

• EU CRD-1 - Incorporating multiple perspectives through collaboration improves computing innovations as they are developed.
• LO CRD-1.A - Explain how computing innovations are improved through collaboration.
• LO CRD-1.C - Demonstrate effective interpersonal skills during collaboration.
• EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
• LO CRD-2.B - Explain how a program or code segment functions.
• EU DAT-1 - The way a computer represents data internally is different from the way the data is interpreted and displayed for the user. Programs are used to translate data into a representation more easily understood by people.
• LO DAT-1.A - Explain how data can be represented using bits.
• EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
• LO DAT-2.D - Extract information from data using a program.
• LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
• EU AAP-1 - To find specific solutions to generalizable problems, programmers represent and organize data in multiple ways.
• LO AAP-1.B - Determine the value of a variable as a result of an assignment.
• EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
• LO AAP-2.A - Express an algorithm that uses sequencing without using a programming language.
• LO AAP-2.B - Represent a step-by-step algorithmic process using sequential code statements.
• LO AAP-2.C - Evaluate expressions that use arithmetic operators.
• LO AAP-2.H - For selection: a. Write conditional statements. b. Determine the result of conditional statements.
• LO AAP-2.K - For iteration: a. Write iteration statements. b. Determine the result or side-effect of iteration statements.
• EU AAP-3 - Programmers break down problems into smaller and more manageable pieces. By creating procedures and leveraging parameters, programmers generalize processes that can be reused. Procedures allow programmers to draw upon existing code that has already been tested, allowing them to write programs more quickly and with more confidence.
• LO AAP-3.A - For procedure calls: a. Write statements to call procedures. b. Determine the result or effect of a procedure call.
• LO AAP-3.B - Explain how the use of procedural abstraction manages complexity in a program.
• LO AAP-3.D - Select appropriate libraries or existing code segments to use in creating new programs.
• LO AAP-3.E - For generating random values: a. Write expressions to generate possible values. b. Evaluate expressions to determine the possible results.
• LO AAP-3.F - For simulations: a. Explain how computers can be used to represent real-world phenomena or outcomes. b. Compare simulations with real-world contexts.
• EU IOC-1 - While computing innovations are typically designed to achieve a specific purpose, they may have unintended consequences.
• LO IOC-1.A - Explain how an effect of a computing innovation can be both beneficial and harmful.

## Key Concepts

Students will understand that models are an abstraction of real environments and will recognize the rationale for and limitations of modeling techniques to analyze problems.

Students will recognize the use of functional and data abstractions in modeling.

Students will be able to develop and test hypotheses using an experimental approach in a modeling framework.

## Essential Questions

• How can computing extend traditional forms of human expression and experience?
• How does abstraction help us in writing programs, creating computational artifacts and solving problems?
• How can computational models and simulations help generate new understanding and knowledge?
• How are algorithms implemented and executed on computers and computational devices?
• Why are some languages better than others when used to implement algorithms?
• How are programs developed to help people, organizations or society solve problems?
• How are programs used for creative expression, to satisfy personal curiosity or to create new knowledge?
• How do computer programs implement algorithms?
• How does abstraction make the development of computer programs possible?
• Which mathematical and logical concepts are fundamental to computer programming?

## Teacher Resources

Student computer usage for this lesson is: required

NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University. Evanston, IL. NetLogo tutorial packet online web version http://ccl.northwestern.edu/netlogo/docs/

Modeling and Simulation 101 video ( https://www.youtube.com/watch?v=X-6zxImekOE )

See http://www.ianbicking.org/docs/PyLogo_lightning.html for a comparison of Logo, PyLogo and Python.

New Mexico "Computer Science for All" bases the entire course on modeling and simulation using NetLogo http://www.cs4all.org/NM-CS108L-Week3-Final

# Session One

## Getting Started (8 min)

Question: Describe something that can be modeled using a simulation.

Introduce modeling and simulation using the first four minutes of the Modeling and Simulation 101 video ( https://www.youtube.com/watch?v=X-6zxImekOE ).  Students open a document for notes for today's session.

Students should record and briefly discuss these four statements about modeling and simulation:

1. Models describe real-world activities using abstration, enabling testing of hypotheses at a fraction of the cost of actual experiments.
2. Models can be expressed in computational, mathematical, textual, and/or graphical forms.
3. Simulations are forms or instances of models that can be implemented as computer programs.
4. Monte Carlo simulations exploit randomness to arrive at their results.

# Learning NetLogo (40 min)

To start, all students should download NetLogo from this link http://ccl.northwestern.edu/netlogo/ or use the web version of the program.

Students should work through the NetLogo tutorial packet either in groups or as a class.

In particular, students should be encouraged to notice commonalities in programming languages (sequence, conditionals, iteration, abstraction) and how differences in languages provide specific tools best suited to particular problems. The domain of modeling and simulation is a huge area in computational thinking, and NetLogo is one of many languages well suited to problem-solving in this domain. Point out that Python is used for modeling and simulation, but to be clear and readable requires the abstraction of libraries to build on that provide the same functionality that comes with a language like NetLogo.

There will be some "thought questions" throughout that students should discuss in their groups and as a class.

# Wrap Up (2 min)

Students should complete an exit ticket listing one interesting idea they learned, or one question they have about NetLogo or modeling.

# Getting Started (5 min)

Review yesterday's NetLogo lesson and ask the students to share what they learned, how NetLogo is similar to or different from Python, and any questions they have about how it works. Point out that it is the ABSTRACTION available in NetLogo that makes it easier to read and write simulation programs because it has features built into it that are readily available that you can build on. This PowerPoint details the abstractions in NetLogo with the accompanying transcript from CS for All in New Mexico. List abstractions available in Python that are different from NetLogo or PyLogo. What is built into each language that makes it easy to use?

# Models and Hypotheses (20 min)

## Introduction (12 min)

• Students start NetLogo and open the Art>Fireworks model.
• Students use the interface Buttons for Setup and Go to run the simulation.
• Have students read the information in the Info tab, and look through the code in the Code tab, to get a sense of what is being simulated and how it works.
• Explain that the model is the description of the environment, while the simulation is the specific implementation of the model.
• Think -Pair- Share: Ask the students to identify which aspects of the real-world environment (actual fireworks) are implemented in the model, and which aspects are not implemented.
• Discuss with the class the fact that choice of simpler characteristics of the model or abstraction make implementation of the model much easier, at the expense of a possible failure to represent key relationships. Also point out that this decison may reflect the bias of the research designers.
• Have the students examine the implementation (Code tab) and find the names of the five functions defined in the code.  Show where all 5 are called (three in the code and two in the Interface).
• Discuss these facts about forming a hypothesis
• A hypothesis is an educated guess about how things work.
• A hypothesis is written like this: "If _____[I do] _____, then _____[this will result]_____
• Explore hypotheses that could be tested by this model.  (Some hypotheses are suggested by the discussion in the Info tab -- e.g., "If gravity is set to 0, then ______."
• What are two things that we can change through the interface?
• What do we think will happen we we make those changes?

## Activity (8 min)

Ask each student to write a hypothesis that can be tested with this simulation, share the hypothesis with elbow partners, and briefly experiment with the parameters to informally test the hypothesis.

# Model Selection and Hypothesis Development (25 min)

Note: The "Hypothesis Testing Worksheet" which will be used for the next two lessons is available in the Lesson Resources Folder

For the rest of today's session and Session 3, students will work in teams of four students to select a model to experiment with, then divide into two partner sets to develop a hypothesis, devise an experimental plan, test the hypothesis, and write about their results.

Directions

1. First divide the class into teams of four students, and have them spend 10 minutes exploring different models in NetLogo (which they should already have some familiarity with from Session 1 and the earlier part of Session 2). They should choose one model to focus on and write the name of the model in the corresponding part of the Hypothesis Testing worksheet, along with a short explanation of why they chose the model.
2. For the next 10 minutes, they should study the parameters of the model and how it works, trying things out in the interface to see how changing the parameters affects behavior.
3. In the final 5 minutes of the class session, split the teams into partner groups and have each partner group begin to develop hypotheses about the team's selected model.  You may wish to assign them to write these hypotheses down as a homework assignment, either together or individually, using the Hypothesis Testing worksheet to record their hypothesis, why they selected that hypothesis, and the experimental parameters they will use to test the hypothesis.

# Warm Up (5 min)

Partners should revisit their hypotheses, and choose one hypothesis to focus on first.  (They can test both hypotheses if they have time.)  Each partner pair should write the name of their model and their selected hypothesis on the board, to share with the other students.

# Experimental Design (10 min)

1. Students should use the Hypothesis Testing worksheet to:
1. Identify an experiment that will test their hypothesis.
2. Select one or more parameters to vary.
3. Choose what settings they will use for these parameters.
4. Determine what measurements they will record.
5. Decide how many trials they will run at each setting.
6. Note: Be sure that students have completed their experimental design before proceeding to perform the experiment -- many students will want to just jump in and start trying things, but one goal of the lesson is to help them understand that having a clear hypothesis and experimental design before collecting data is an important part of the scientific process.

# Hypothesis Testing (30 min)

For the next twenty to thirty minutes, students should carry out their experiments and record the appropriate measurements.

At the end of the section or for homework, students should write up their findings in a short report, showing the data they've collected (optionally in a graphical form, particularly if assigned as homework), discussing what the data says about their hypothesis, and concluding whether the hypothesis is supported or refuted by the simulation.

# Wrap Up (5 min)

Students should come back into their teams to share their findings, and discuss the advantages and disadvantages of using models and simulations to develop and test hypotheses.

## Formative Assessment

Students will share and post their hypothesis before testing and sharing the results. Teachers will verify that the hypothesis are falsifiable and testable by the simulations.

## Summative Assessment

Students will select a model, develop a hypothesis, design an experiment, and use a simulation to test the hypothesis.

## Lesson Summary

Pre-Lesson Preparation: Students need to have already chosen a topic and had it approved by the instructor. Students can use the following sources to help choose a data set:

Summary:

This lesson is the summative assessment for Unit 4 on Data Analysis.    Students will select a data set and write a small Python program to analyze the data.  Students will then write a summary of their findings to demonstrate understanding of the data analysis process.

Outcomes:

• This unit assessment is designed to provide more practice in project based-work to prepare students for the final performance project at the end of this class.

Overview:

1. Getting Started (5 min) - Overview of task for the day.
2. Independent Activity (40 min) - Individually or in pairs, students collect and analyze data on their topic.
3. Wrap Up (5 min) - Overview of homework goals and expectations.
4. Homework: Individual two-page summary about their findings.

## Math Common Core Practice:

• MP1: Make sense of problems and persevere in solving them.
• MP2: Reason abstractly and quantitatively.
• MP3: Construct viable arguments and critique the reasoning of others.
• MP4: Model with mathematics.
• MP5: Use appropriate tools strategically.
• MP6: Attend to precision.
• MP7: Look for and make use of structure.

## Common Core Math:

• S-ID.1-4: Summarize, represent, and interpret data on a single count or measurement variable
• S-ID.5-6: Summarize, represent, and interpret data on two categorical and quantitative variables
• S-ID.7-9: Interpret linear models
• S-IC.3-6: Make inferences and justify conclusions from sample surveys, experiments and observational studies

## Common Core ELA:

• RST 12.3 - Precisely follow a complex multistep procedure
• WHST 12.5 - Develop and strengthen writing as needed by planning, revising, editing, rewriting
• WHST 12.6 - Use technology, including the Internet, to produce, publish, and update writing products
• WHST 12.7 - Conduct short as well as more sustained research projects to answer a question

## NGSS Practices:

• 1. Asking questions (for science) and defining problems (for engineering)
• 2. Developing and using models
• 3. Planning and carrying out investigations
• 4. Analyzing and interpreting data
• 5. Using mathematics and computational thinking

## Key Concepts

Students will demonstrate their understanding of the process of collecting and evaluating data.

## Essential Questions

• How can computation be employed to help people process data and information to gain insight and knowledge?
• How can computation be employed to facilitate exploration and discovery when working with data?
• What opportunities do large data sets provide for solving problems and creating knowledge?
• How are algorithms implemented and executed on computers and computational devices?
• How are algorithms evaluated?
• How are programs developed to help people, organizations or society solve problems?
• How are programs used for creative expression, to satisfy personal curiosity or to create new knowledge?
• How do computer programs implement algorithms?
• How do people develop and test computer programs?
• Which mathematical and logical concepts are fundamental to computer programming?

## Teacher Resources

Student computer usage for this lesson is: required

Rubric provided on Google Drive - Rubric - Unit 4 Summative Assessment.htm in the lesson folder.

# Getting Started (5 min)

Verify that every student has selected a topic (approved by the instructor in advance) and address what the goal is for today.

# Independent Activity (40 min)

Students will either individually or in pairs (instructor's decision) create a small program that reads data from a file, analyzes it, creates a simple simulation and finally writes data to a file.

# Wrap Up (5 min)

Presentation about the expectations of the homework assignments.

# Homework

Each student should create a 2-page typed summary that explains the following areas:

• The chosen data
• Why the data topic was chosen
• The analysis process
• The results of the analysis process
• The coding process used for data analysis

## Options for Differentiated Instruction

Instructor has the option to have students work individually or in pairs for this assessment.

## Formative Assessment

Review Rubric with class and clarify expectations.

## Summative Assessment

Students will be assigned a unit project, with a topic of their choice, to demonstrate their understanding and mastery of the concepts of data collection and analysis.