For this session, use the presentation "Finding Data" in the Lesson Resources Folder.
Getting Started (5 min)
Given this data: [slide 1]
A blood drive at the local high school reveals that 20% of the students were HIV positive.
Journal on these questions:
- What is your immediate reaction?
- What questions do you have?
Activities (40 minutes)
Activity 1 (5 min) - Discuss the journal prompt
Lead the students in discussion using the bullets below and slide 2 of the PowerPoint as guidance. Students should talk about WHY they assumed the data was true or were uncomfortable questioning the truth of the data.
Activity 2 (5 min) - Brainstorm: What kinds of data can be found online?
Part 1 - Discussion
Data comes from many places and takes many forms [slide 3]
- Have students discuss: How do business, personal, government and devices create and use data?
- Do computers perceive and store data in the same way that humans do?
- Additional research is needed to understand the exact nature of the relationship.
Part 2 - Brainstorm
Brainstorm as a class: what kinds of data are generated? Possible answers:
- video: movies, webcam images, CCTV, youTube, Netflix, Facebook, etc.
- pictures: maps, Instagram, photos, cartoons, drawings, …. everything!
- words: books, articles, news, stories, blogs, Facebook
- numbers: facts, financial transactions, scientific data
- sound: music, speech
- behavior tracking: GPS, click behavior, search history
- IMPORTANT POINT: Compters see and record digital data which is only an approximation of the real world. The sample rate determines the accuracy of the digital approximation. The real world is analog. Analog data have values that change smoothly, rather than in discrete intervals, over time. Some examples of analog data include pitch and volume of music, colors of a painting, or position of a sprinter during a race.
Activity 3: How is meaning created from data? (10 minutes)
- Look at some data gathered about selfies from different cities around the world. [slide 4]
- Main ideas:
- You have to gather the data and analyze it to create meaning.
- Creating meaning from pictures still takes some human interpretation.
- Digitally processed data can show a correlation between variables and a correlation found in data does not necessarily indicate that a causal relationship exists.
- A single source does not contain the data needed to draw a conclusion. It may be necessary to combine data from a variety of sources to formulate a conclusion.
- Prompt students to come to a conclusion about the graphed data on the page.
- Question for discussion: How large of a sample is needed to draw a conclusion?
- Quick review: Make the point that there is a LOT of data even in a single picture. [slide 5]
- Define these and put them in order. Use this webpage to review bytes: http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html
- MB, bit, TB, ZB, byte, GB, pixel (one dot of color on the screen), KB, PB
- Look at the photo on slide 5.
- 365 gigapixels is 365 billion pixels, if the picture is a square, then it is 604,152 pixels on each side (too big to fit on any HDTV screen)
- http://www.rtings.com/info/what-is-the-resolution A 4K super high resolution TV is only about 3,000 X 2,000 pixels. Even a movie screen can’t show all of the detail!
- https://www.amctheatres.com/sony4k, you can only look at it one part at a time.
- Preview Wolfram Alpha, an engine for providing knowledge from data.
- Show the introductory video: https://www.wolframalpha.com/tour/what-is-wolframalpha.html (1:18) [slide 6]
- Identify ways that patterns can emerge when data are transformed using programs.
- Experiment to demonstrate how insight and knowledge can be obtained from translating and transforming digitally represented information.
- Students will explore these sites in the next activity.
- Point out that there are processes that can be used to extract or modify information from data in both bebeficial and harmful ways. These processes include the following:
● machine learning and data mining
● transforming every element of a data set, such as doubling every element in a list, or extracting the parent’s email from every student record
● filtering a data set, such as keeping only the positive numbers from a list, or keeping only students who signed up for band from a record of all the student
● combining or comparing data in some way, such as adding up a list of numbers, or finding the student who has the highest GPA
● visualizing a data set through a chart, graph, or other visual representation
Activity 4 (20 min) - Work with some data online
- Students should complete the Data Search and Analysis Handout. [slide 7]
- Depending on how much time you have, you can pair students and assign even/odd questions or chunks of questions to different groups, or have each student research on their own.
- If there’s time in class, try to go over results and compare (especially the first 5) to see if people got similar answers. Why or why not? [slide 8]
Assign Homework (5 minutes)
Give students the worksheet: Homework Unit 4 Lesson 1.
There are 10 videos to choose from, each 10-15 minutes long. Either allow students to self-select, or assign them a particular video. Students should watch the video and answer the questions on the worksheet. This is an opportunity to discuss plagiarism: students are expected to watch the video and write from their own experience.
For this session, use the presentation: Finding and Analyzing Data from the Lesson Resources Folder
Getting Started (5 min)
Students should journal on the following: Describe at least 2 ways that we create meaning out of data. [slide 1]
- Possible answers: graph it, total it, average it, find min and max, map it, compare it to other data, find trends, generate predictions (like weather), draw conclusions (facial recognition, emotions, voice inflection), diagnose diseases, discover new stars, etc.
Activities (40 min)
Activity 1 (35 min): Analyzing Data
Part 1: Correlation vs. Causation
- Look at slide 2 from the PowerPoint. Creating meaning from data can be misleading.
- Point out that the graph shows a direct relationship between the number of divorces in Maine and the amount of margarine that is purchased. When one goes up, the other does too, and vice versa. Is this a causal relationship?
- Show some examples from the Tyler Vigen website http://www.tylervigen.com/spurious-correlations . It has many examples of data connections that may be statistically valid but don’t make sense. The site was created to point out how comparisons due to data correlation are often not valid.
Part 2: Data Science
- What does a data scientist do? [slide 3] Show the two videos and discuss.
- Say: Tricks to analyzing big data:
- Knowing what data to use, and what to disregard.
- Dealing with non uniform data - or data enterred in a variety of formats
- Knowing how to clean data - to make the data conform to a format without changing its meaning.
- Knowing how to use data filtering tools to find information, recognize patterns and predict trends.
- Look at 3 false assumptions about big data [slide 4]:
- It’s complete and accurate
- It tells the whole story
- Bigger is better
- What considerations and tradeoffs arise in the computational manipulation of data? [slide 4]
- How do you account for missing data?
- How do you certify your sources?
- How do you decide which data to include and which to exclude?
- How much data is enough? The size of a data set affects the amount of information that can be extracted from it.
- Are your processing algorithms accurate?
- What is some of the data needed to successfully fly a space mission? (Possible answer: Knowing all about the spacecraft: speed, direction, amount of fuel/oxygen left.) The same problems that applied to early space missions are some of the same problems faced in dealing with big data.
- You need to decide which factors to include in your calculations, and which to exclude.
- You need to decide when to make an assumption for missing data or when to estimate.
- In writing a program for an early space flight there are many unknown factors using a space craft that has never flown before.
- It’s usually impossible to create a perfect algorithm that can take into account every possibility, so how do you allow for errors and changes?
- What are some of the calculations needed? (Possible answers: how much fuel to release and with which engines.)
- They had to run many simulations first to see what would happen under various circumstances.
- See if anybody knows how NetFlix, movie makers, or Amazon use data about their customers to be more successful. [slide 5] http://www.smartdatacollective.com/bernardmarr/312146/big-data-how-netflix-uses-it-drive-business-success and http://www.fastcompany.com/3024655/pitch-perfect-and-how-analytics-are-transforming-movie-marketing
Businesses like Amazon and NetFlix learn the habits of different customers and make recommendations based on their previous choices and others who share similar characteristics (like Google ads).
See if anybody knows the story of Moneyball (based on a true story) of how a baseball team made decisions based on data analysis to become winners, https://en.wikipedia.org/wiki/Moneyball_(film) and how Vivek Ranadivé--who knew little about basketball but owned a multi-million dollar computer processing company and knew how to choose and analyze data--coached his then twelve-year-old daughter’s National Junior Championship basketball team to the national championship game. He relied upon his sporting knowledge of soccer and cricket paired with his analytic mindset, to create a system of play which allowed his relatively un-athletic team to excel. From the moment that he used intellect and his business experience to coach an inexperienced team to the championship game, the man who once thought basketball was “mindless” was hooked on the sport. http://www.newyorker.com/magazine/2009/05/11/how-david-beats-goliath
- How is data analyzed? Data analysis requires an algorithm, a plan to collect and process data. [slide 6]
- Generate a discussion about what data is collected and how it is analyzed. What is a possible algorithm for making a decision about choosing what movies NetFlix might suggest for a customer?
Brainstorm: what other data might they collect? (what’s currently popular in that age group, demographic, etc.)
- Choose one of the options and write an outline of an algorithm: choosing a movie to produce or a sports player to hire. [slide 7]
Share and discuss.
- Describe at least two calculations needed
- Describe some of the data you’d need to collect.
Describe how the data sets needed could pose challenges regardless of size, such as:
- the need to clean data
- incomplete data
- invalid data
- the need to combine data sources
Activity 2 (5 min):
Present homework from the previous day after watching TED talks on data. [slide 8]
Summarize all of the questions from the homework to be presented to the class and collect the written summaries to grade.
Journal (5 min)
In your writing journal, map out the steps to answer a specific question or find a solution to solve a specific problem using data.
Guidance for Practice Questions - Question Set 17
Questions in the AP Classroom Question Bank may be used for summative purposes.
Sixty of the 80 questions are restricted to teacher access. The remaining 20 questions are from public resources.
Questions are identified by their initial phrases.
A student is recording a song on her computer. ...
An Internet service provider (ISP) is considering
Digital images are often represented by the red...
Which of the following is a true statement abou...
Data analysis activities from NOAA, NASA, and more! - http://climate-expeditions.org/educators/activities.html
What is data acquisition? - http://www.ni.com/data-acquisition/what-is/
Data analysis and graphs (with Excel sample) - http://www.sciencebuddies.org/science-fair-projects/project_data_analysis.shtml
Collecting and analyzing data - http://ctb.ku.edu/en/table-of-contents/evaluate/evaluate-community-interventions/collect-analyze-data/main
Using Excel for Handling, Graphing, and Analyzing Scientific Data: A Resource for Science and Mathematics Students - http://academic.pgcc.edu/psc/Excel_booklet.pdf