Unit 4. Data Acquisition
Revision Date: Sep 20, 2015 (Version 1.2)Summary
In this lesson, students will learn how to acquire and analyze data to find answers to questions and solutions to problems. Students will consider whether or not the data they are presented with is necessarily valid, and research some of the various data sources online.
Outcome
Overview
Session 1
Session 2
Students will be able to acquire data and analyze it to find answers to a specific question or solutions for a specific problem.
Student computer usage for this lesson is: required
Student computer usage for this lesson is: optional
In the Lesson Resources Folder
Webpages Session 1
Webpages Session 2
For this session, use the presentation "Finding Data" in the Lesson Resources Folder.
Given this data: [slide 1]
A blood drive at the local high school reveals that 20% of the students were HIV positive.
Journal on these questions:
Lead the students in discussion using the bullets below and slide 2 of the PowerPoint as guidence. Students should talk about WHY they assumed the data was true, or were uncomfortable questioning the truth of the data.
Part 1 - Discussion
Data comes from many places and takes many forms [slide 3]
Part 2 - Brainstorm
Brainstorm as a class: what kinds of data are generated? Possible answers:
Give students the worksheet: Homework Unit 4 Lesson 1.
There are 10 videos to choose from, each 10-15 minutes long. Either allow students to self-select, or assign them a particular video. Students should watch the video and answer the questions on the worksheet. This is an opportunity to discuss plagiarism: students are expected to watch the video and write from their own experience.
For this session, use the presentation: Finding and Analyzing Data from the Lesson Resources Folder
Students should journal on the following: Describe at least 2 ways that we create meaning out of data. [ slide 1]
Part 1: Correlation vs. Causation
Part 2: Data Science
Businesses like Amazon and NetFlix learn the habits of different customers and make recommendations based on their previous choices and others who share similar characteristics (like Google ads).
See if anybody knows the story of Moneyball (based on a true story) of how a baseball team made decisions based on data analysis to become winners, https://en.wikipedia.org/wiki/Moneyball_(film) and how Vivek Ranadivé--who knew little about basketball but owned a multi-million dollar computer processing company and knew how to choose and analyze data--coached his then twelve-year-old daughter’s National Junior Championship basketball team to the national championship game. He relied upon his sporting knowledge of soccer and cricket paired with his analytic mindset, to create a system of play which allowed his relatively un-athletic team to excel. From the moment that he used intellect and his business experience to coach an inexperienced team to the championship game, the man who once thought basketball was “mindless” was hooked on the sport. http://www.forbes.com/sites/aliciajessop/2013/05/28/why-the-kings-are-staying-in-sacramento-meet-vivek-ranadive/
If time is short, choose only 1 or 2 of the questions from the homework to be presented to the class and collect the rest to grade.
In your writing journal, map out the steps to answer a specific question or find a solution to solve a specific problem using data.
Data analysis activities from NOAA, NASA, and more! - http://climate-expeditions.org/educators/activities.html
What is data acquisition? - http://www.ni.com/data-acquisition/what-is/
Data analysis and graphs (with Excel sample) - http://www.sciencebuddies.org/science-fair-projects/project_data_analysis.shtml
Collecting and analyzing data - http://ctb.ku.edu/en/table-of-contents/evaluate/evaluate-community-interventions/collect-analyze-data/main
Using Excel for Handling, Graphing, and Analyzing Scientific Data: A Resource for Science and Mathematics Students - http://academic.pgcc.edu/psc/Excel_booklet.pdf
Journal day 1:
Given this fictitious data:
A blood drive at the local high school reveals that 20% of the students were HIV positive.
Journal day 2: Describe at least 2 ways that we create meaning out of data.
Homework: Feedback from a TED video on big data
Students complete the Data Search and Analysis student activity.
Write an outline of an algorithm to make a data-based decision about what movie to produce or what sports team member to hire.
Unit 4. Data Acquisition
Revision Date: Oct 27, 2015 (Version 1.2)Summary
Students will define and identify models and simulations. They will work in groups to propose a simulation that could be used to investigate a hypothesis.
Outcomes
Overview
Source
Some of the ideas in this lesson were adapted from the CS10K community site, https://sites.google.com/site/mobilecsp/lesson-plans/realworldmodels.
Student computer usage for this lesson is: optional
These videos supplement the material covered in this lesson:
Introduce Vocabulary
Most of the time we need to use experimental instead of theoretical probability. Have students think about this as they answer the following questions with a partner:
Discussion: As a whole class, discuss how the students determined their answers to each question. (Did some students want to use the computer to access data or actually perform the paper throwing experiment?)
Use the above examples to define models and simulations to the students:
(Vocabulary from: http://www.systems-thinking.org/modsim/modsim.htm.)
Examples of models (do not need to show the entire videos for student understanding):
Watch this video of a human heart simulation: Multi-scale Multi-physics Heart Simulator UT-Heart (5:15) (watch up to 2:00; the rest is interesting but not necessary).
What’s an advantage to having so many data points? What about a disadvantage? (A supercomputer is necessary to run the simulation.)
How can you test a parachute to be used on Mars? https://www.youtube.com/watch?v=_jOzxEOlDJg (1:11)? Describe the physical test. Before that test, they create models and simulate on the computer - why? (It is very costly to run a test and to create an actual parachute. First be sure an idea passes a simulated test, then build it.)
Have students list models they have seen (and have interacted with) in each of the following (time permitting, have the students find websites to share):
Examples of Simulations:
Have students find and share simulations in each of the following:
In this activity, the class will match a person and a character in a contest and propose models and simulations to predict the winner.
Have the class suggest people, characters, and activities to fill the chart (start with a blank chart; entries below are for example only):
Person | Character | Activity |
RG III | Sponge Bob | Ping-pong |
Bill Gates | Elektra | Jeopardy |
Shakira |
Harry Potter | Cooking |
Journal: Have students record the definitions (in their own words) of the vocabulary used in this lesson: probability, model, simulation, and hypothesis.
Unit 4. Data Acquisition
Revision Date: Sep 21, 2015 (Version 1.2)Summary
Students will formulate a hypothesis, run simulations, and analyze the results to determine what needs to be modified in their hypothesis and/or the simulation itself.
Outcomes
Overview
Source
The coin flipping extension is based on a CS10K lesson: https://sites.google.com/site/mobilecsp/lesson-plans/lp-coinflip-miniprojects
Students will be able to:
Student computer usage for this lesson is: required
The PowerPoint "Using Data and Simulations" can be found in the Lesson Resources folder.
Penny Bias article to go with lesson extension: http://mathtourist.blogspot.com/2011/02/penny-bias.html
For the Monty Hall Problem extension:
Online simulation of the problem: http://math.ucsd.edu/~crypto/cgi-bin/MontyKnows/monty2?1+17427
There are several videos on YouTube demonstrating and explaining the Monty Hall Problem.
An animated video: https://www.youtube.com/watch?v=mhlc7peGlGg length is 5:48
Live action video: https://www.youtube.com/watch?v=4Lb-6rxZxx0 length is 5:30
There is sample code for the die Python program in the Lesson Resources Folder called 4-3 Sample Code.py
Journal:
Have students share their answers with the class.
import random
at the top of the coderandom.randint(min,max)
returns a 'random' integer between the min and max values (inclusive).Ask the class: Does your program represent a sufficient simulation for rolling dice? What are the advantages/disadvantages of using a program vs. actual dice?
Journal: Summarize how a program can be used as a simulation to test a hypothesis.
Students can be provided with the code for a function to simulate rolling one die and use it to develop the rest of the program.
After the first group activity, the teacher can swap a student from each group to allow different input into the next group activity.
This extension is based on advanced mini project # 4, which can be found here: https://docs.google.com/a/smcps.org/document/d/1AKHpiQ87bE4W1YzHlAFh2uNAHuEtdMOCQVV6HfxfDzc/edit
Students read an article about the 'randomness' of flipping a penny: http://mathtourist.blogspot.com/2011/02/penny-bias.html
Next, students should hypothesize the results of lining up 10 pennies on edge and knocking them over (as described in the article). Students need to determine how many times to run the experiment, collect data, and analyze the results.
Students should work in pairs to write a computer simulation for the penny experiment. (Note: this is a program based on experimental data, not theoretical.)
Discuss as a class the validity of the simulation written. Can this simulation be used for other coins?
In the game show "Let's Make a Deal", the original host was Monty Hall. Onvery show, Monty would present a player with three doors or curtains to choose from. The contestant was asked to choose a door in search of a prize. After making a selection, Monty Hall would open one of the doors not selected by the contestant to reveal a non-prize (perhaps a goat). Then Monty would ask if the contestant wanted to change their choice.
After explaining the show to the class ask, "Should the contestant change?" Students should propose a hypothesis.
Have the students design a simulation to test their hypothesis (discuss what is the data collected and the number of times the simulation should run to collect data). After running the simulation, students should evaluate their hypothesis and determine whether it needs to be modified or whether the simulation needs to be modified.
If Monty Hall had four doors, what should the contestant do?
What should the contestant do if they know that Monty does not know what is behind each door?
Online simulation of the problem: http://math.ucsd.edu/~crypto/cgi-bin/MontyKnows/monty2?1+17427
There are several videos on YouTube demonstrating and explaining the Monty Hall Problem.
An animated video: https://www.youtube.com/watch?v=mhlc7peGlGg length is 5:48
Live action video: https://www.youtube.com/watch?v=4Lb-6rxZxx0 length is 5:30
Review student journal entries and class discussions to determine students' understanding of simulations, a hypothesis, and the ability to determine a method to test a hypothesis.
Describe an algorithm to simulate drawing an ace of any suit from a standard deck of cards.
Make a hypothesis about drawing cards from a standard deck of cards and determine how to collect data to answer your hypothesis.
Unit 4. Data Acquisition
Revision Date: Sep 28, 2015 (Version 1.2)Summary
This lesson introduces students to reading information from an input file and writing to an output file as a functionality of Python programming. The students will then apply these concepts to program a simple Dice Roll application to generate data. This lesson will prepare students to read and write files for use in later Data Acquisition lessons.
Outcomes
Overview
Session 1
Session 2
countif
. (This exercise may be assigned as homework if students have the computing resources to complete a programming assignment as homework.)
The students must understand how to open and read from an input file using Python.
The students must understand how to declare and write to an output file using Python.
Student computer usage for this lesson is: required
Python for Informatics by Charles Severance, http://www.pythonlearn.com/book.php.
Explanation of the CountIf function in Excel http://office.microsoft.com/en-us/excel-help/countif-HP005209029.aspx.
The mbox.txt and mbox-short.txt files are in the Lesson Resources Folder.
Think-Pair-Share
What are the advantages and disadvantages of:
Have students review their journal entries as a class and note the advantages and disadvantages on a white board.
The students should code the examples in the book as the teacher proceeds through the lessons.
countIf
function in Excel.this = COUNTIF(A1:A1000,1)
counts how many 1s are in the range A1 to A1000. You can show the example on the Microsoft office help website. http://office.microsoft.com/en-us/excel-help/countif-HP005209029.aspx countif
function to compare the distribution of the rolls for how many times each number 2 through 12 was rolled with the pair of six-sided dice to the distribution for the 12-sided die.
Have students work in pairs as the new concepts are introduced and practiced.
For a class needing more scaffolding: Work as a group. Have students take turns around the room to read aloud the brief text in each section in Chapter 7. Do the short exercises together with a "row captain" assigned to each row (or group) in the classroom who is in charge of checking that everybody in their row has completed each short task and has gotten the help needed to finish. Row captains help each other until the entire class has successfully completed each task. Report out on what challenges were encountered, recording problems and solutions at the front of the classroom as the class works. Rotate the role of row captain for each section.
For more independent students: Introduce/demonstrate the key ideas first and then allow student to work through Chapter 7 at their own pace.
The teacher will check the student’s code for understanding.
The teacher will check for understanding as each new concept is introduced.
Exercise 7.1 Write a program to read through a file and print the contents of the file (line by line) all in upper case. Executing the program will look as follows:
python shout.py
Enter a file name: mbox-short.txt
FROM STEPHEN.MARQUARD@UCT.AC.ZA SAT JAN 5 09:14:16 2008
RETURN-PATH: <POSTMASTER@COLLAB.SAKAIPROJECT.ORG>
RECEIVED: FROM MURDER (MAIL.UMICH.EDU [141.211.14.90])
BY FRANKENSTEIN.MAIL.UMICH.EDU (CYRUS V2.3.8) WITH LMTPA;
SAT, 05 JAN 2008 09:14:16 -0500
You can download the sample input file from www.py4inf.com/code/mbox-short.txt
Exercise 7.2 Write a program to prompt for a file name, and then read through the file and look for lines of the form:
X-DSPAM-Confidence: 0.8475
When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating point number on the line. Count these lines and the compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.
Enter the file name: mbox.txt
Average spam confidence: 0.894128046745
Enter the file name: mbox-short.txt
Average spam confidence: 0.750718518519
Test your file on the mbox.txt and mbox-short.txt files.
Unit 4. Data Acquisition
Revision Date: Oct 13, 2015 (Version 1.2)Pre-lesson Preparation
Your students will need computers for this lesson. If you would like to show students a working dartboard simulation (with a circular dartboard), check that your browser can run a Java plug-in. Be sure to update, activate, and disable the plug-in as needed for security purposes.
Summary
In this lesson, students will explore basic data analysis concepts in Python, learn about code extensibility, create a simple simulation from scratch, and reuse their code to make a more elaborate simulation.
Outcomes
Overview
Session 1:
Session 2:
Part of this lesson was adapted from http://www.nzmaths.co.nz/resource/dartboards and http://www.nzmaths.co.nz/resource/more-dartboards.
The development of a program from scratch to solve a specific problem is presented to students by creating a simulation that lets them see how software can model a real-world process. Additionally, the concepts of extensibility and code reuse are shown through hands-on programming experience.
Student computer usage for this lesson is: required
The Lesson Resources folder contains an example program showing how to use Python's random function to simulate tossing a coin.
An alternative lesson outline using Runestone and PyCharm to code a simulation and use it to develop, refine and test hypotheses in is the lesson folder. The lesson is in a file named "Monte Carlo Simulation to Calculate Pi.docx".
Think-Pair-Share: Writing programs "from scratch"
Get students' attention by asking them to play with the Dartboard Simulator (requires the Java browser plug-in) as they think about the following scenario and answer the related questions:
Suppose you want to write a program that simulates tossing virtual darts. Each dart will land at a point on a square virtual dartboard that is one unit long on each side. Each point on this dartboard has both an x and a y coordinate, both of which are between 0 and 1. The bull's-eye is a square in the center of the dartboard with sides of length 0.5 units.
Have the program ask the user how many darts they want thrown. The program should then simulate throwing these darts by generating a random landing location (a random x and a random y coordinate) on the dartboard for each dart. Recall how to use Python's random functions (by reviewing the previous lessons' dice simulation). As darts are thrown, the program counts how many darts land within the center square, the bull's-eye. The bounds of bull's-eye are [0.25, 0.75] on both the x and y axes. Finally, the program should print out the number of darts thrown and the number that landed within that rectangle.
Have your students answer the following questions:
Take the rest of the class time to have your students begin programming their simulation. If they are not able to finish before the session ends, you may want to assign the program as homework, or devote the beginning of the second session to finishing the program. They will need their programs for the work in the next session.
You may want to remind students how to use Python's random function. The following code may be a useful example:
Example random coin flipping code (the python file is available in the Lesson Resources folder):
import random # Needed for random number generation
number_of_heads = 0
for i in range(0, 100):
x = random.random() # Generates a random floating point (decimal) number between 0 and 1
if x > 0.5:
number_of_heads = number_of_heads + 1
print "The number of heads in 100 coin flips is ", number_of_heads
Have your students finish their dartboard programs, as they are needed in the next session. Alternatively, if they have finished their programs, you could assign the "Collecting and Analyzing Data" think-pair-share of the next session as a homework, to be discussed in the next session.
Journal: Making your programs extensible
Think-pair-share: Collecting and analyzing data
Discuss with your students how we often want to reuse our code for a new project, and how it is not uncommon when developing a program for the requirements to change. Both of these changes benefit from extensibility in code. Your students will get to test the extensibility of their dartboard program by reusing what they have to fit with a new objective: make a three-ringed circular dartboard.
As before, this program should first ask a user how many darts they would like to throw. Then, it should use that input to simulate throwing darts at a circular dartboard. Finally, it should print the number of darts thrown, the number of darts that hit the bull's-eye, the number that hit the middle ring, the number that hit the outer ring, and the number that missed completely. This circular dartboard is similar to the one from the previous session's robot exercise: it has a central circular bull's-eye surrounded by a middle ring, which is in turn surrounded by an outer ring. The coordinates for this dartboard are: the center is at coordinate (0,0); the outermost ring is a circle with radius of 3; the middle has a radius of 2; and the bull's-eye has a radius of 1. Simulate throwing a dart by picking random x and y coordinates, each between -3 and 3. Since this range is a square, some darts may miss the dartboard completely. For this program, students may reuse as much of their square dartboard code as they need, but make sure to preserve their original program separately.
Have your students finish their circular dartboard programs and answer the following questions:
Note: graphics.py may be used with this lesson to create a visualization - http://mcsp.wartburg.edu/zelle/python/ppics2/code/graphics.py
The students will produce two simulation programs on their own: the square dartboard simulation and the circular dartboard simulation.
The students will record their understanding of extensibility in their journal.
The students will gain experience collecting data in both sessions.
The students will think analytically about their programs by answering the questions in Session 1 and in the homework for Session 2.
Unit 4. Data Acquisition
Revision Date: Dec 01, 2015 (Version 1.2)Note: This optional lesson is designed for advanced students and teachers who would like to introduce another programming tool and environment: NetLogo. Teachers may also choose to complete only the first session (on the basics of NetLogo), to expose students to a new computational platform and way of thinking, and to extend the ideas in Unit 4 about modeling and simulation.
Summary
This lesson teaches students to use simulations to develop and refine hypotheses, then use the simulations to test these hypotheses. NetLogo, which is used throughout the lesson to illustrate the use of functional and data abstraction, is a programmable modeling environment for simulating natural and social phenomena.
NetLogo uses an extension of Logo instead of Python, so students are not expected to write new code in this lesson. See http://www.ianbicking.org/docs/PyLogo_lightning.html for a comparison of Logo and Python.
Outcomes
Students will understand that models are abstraction of real environments and will recognize the rationale for and limitations of modeling techniques to analyze problems.
Students will recognize the use of functional and data abstractions in modeling.
Students will be able to develop and test hypotheses using an experimental approach in a modeling framework.
Overview
Session 1 - Modeling in NetLogo
Session 2 - Models and Hypothesis Design
Session 3 - Hypothesis Testing
Students will understand that models are abstraction of real environments and will recognize the rationale for and limitations of modeling techniques to analyze problems.
Students will recognize the use of functional and data abstractions in modeling.
Students will be able to develop and test hypotheses using an experimental approach in a modeling framework.
Student computer usage for this lesson is: required
NetLogo. http://ccl.northwestern.edu/netlogo/. Center for Connected Learning and Computer-Based Modeling, Northwestern University. Evanston, IL.
Modeling and Simulation 101 video ( https://www.youtube.com/watch?v=X-6zxImekOE )
Introduce modeling and simulation using the first four minutes of the Modeling and Simulation 101 video ( https://www.youtube.com/watch?v=X-6zxImekOE ). Students open a document for notes for today's session.
Students should record and briefly discuss these four statements about modeling and simulation:
To start, all students should download netlogo from this link http://ccl.northwestern.edu/netlogo/.
Students should work through the NetLogo tutorial packet either in groups or as a class. There will be some "thought questions" throughout that students should discuss in their groups and as a class.
Students should complete an exit ticket listing one interesting idea they learned, or one question they have about NetLogo or modeling.
Review yesterday's NetLogo lesson and ask the students to share what they learned, how NetLogo is similar to or different from Python, and any questions they have about how it works.
Ask each student to write a hypothesis that can be tested with this simulation, share the hypothesis with elbow partners, and briefly experiment with the parameters to informally test the hypothesis.
Note: The "Hypothesis Testing Worksheet" which will be used for the next two lessons is available in the Lesson Resources Folder
For the rest of today's session and Session 3, students will work in teams of four students to select a model to experiment with, then divide into two partner sets to develop a hypothesis, devise an experimental plan, test the hypothesis, and write about their results.
Directions
Partners should revisit their hypotheses, and choose one hypothesis to focus on first. (They can test both hypotheses if they have time.) Each partner pair should write the name of their model and their selected hypothesis on the board, to share with the other students.
For the next twenty to thirty minutes, students should carry out their experiments and record the appropriate measurements.
At the end of the section or for homework, students should write up their findings in a short report, showing the data they've collected (optionally in a graphical form, particularly if assigned as homework), discussing what the data says about their hypothesis, and concluding whether the hypothesis is supported or refuted by the simulation.
Students should come back into their teams to share their findings, and discuss the advantages and disadvantages of using models and simulations to develop and test hypotheses.
Students will share and post their hypothesis before testing and sharing the results. Teachers will verify that the hypothesis are falsifiable and testable by the simulations.
Students will select a model, develop a hypothesis, design an experiment, and use a simulation to test the hypothesis.
Unit 4. Data Acquisition
Revision Date: Nov 10, 2015 (Version 1.2)Pre-Lesson Preparation: Students need to have already chosen a topic and had it approved by the instructor. Students can use the following sources to help choose a data set:
http://www.data.gov/ , http://data.princeton.edu/wws509/datasets , http://www.statsci.org/datasets.html
Summary:
This lesson is the summative assessment for Unit 4 on Data Analysis. Students will select a data set and write a small Python program to analyze the data. Students will then write a summary of their findings to demonstrate understanding of the data analysis process.
Outcomes:
Overview:
Students will demonstrate their understanding of the process of collecting and evaluating data.
Student computer usage for this lesson is: required
Rubric provided on Google Drive - Rubric - Unit 4 Summative Assessment.htm in the lesson folder.
Verify that every student has selected a topic (approved by the instructor in advance) and address what the goal is for today.
Students will either individually or in pairs (instructor's decision) create a small program that reads data from a file, analyzes it, creates a simple simulation and finally writes data to a file.
Presentation about the expectations of the homework assignments.
Each student should create a 2-page typed summary that explains the following areas:
Instructor has the option to have students work individually or in pairs for this assessment.
Review Rubric with class and clarify expectations.
Students will be assigned a unit project, with a topic of their choice, to demonstrate their understanding and mastery of the concepts of data collection and analysis.