Lesson Summary

Summary

Big Data has been defined in many different ways. Easy access to large sets of data and the ability to analyze large data sets changes how people make decisions. Students will explore how Big Data can be used to solve real-world problems in their community. After watching a video that explains how Big Data is different from how we have analyzed and used data in the past, students will explore Big Data techniques in online simulations. Students will identify appropriate data source(s) and formulate solvable questions.

Students are introduced to parallel and distributed programming and to their use with big data.

Learning Objectives

CSP Objectives

  • EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
    • LO CRD-2.J - Identify inputs and corresponding expected outputs or behaviors that can be used to check the correctness of an algorithm or program.
  • EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
    • LO DAT-2.A - Describe what information can be extracted from data.
    • LO DAT-2.C: - Identify the challenges associated with processing data.
    • LO DAT-2.D - Extract information from data using a program.
    • LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
  • EU CSN-2 - Parallel and distributed computing leverage multiple computers to more quickly solve complex problems or process large data sets.
    • LO CSN-2.A - For sequential, parallel, and distributed computing: a. Compare problem solutions. b. Determine the efficiency of solutions.
    • LO CSN-2.B - Describe benefits and challenges of parallel and distributed computing.

Key Concepts

Outcomes

  • Students will explain how analyzing Big Data is different from the way ordinary data is analyzed.
  • Students will provide examples of how the scalability of systems is an important consideration when working with data sets, as the computational capacity of a system affects how data sets can be processed and stored.
  • Students will describe how computers can make predictions and answer questions through the use of Big Data, storage of data, and processing data.
  • Students will synthesize the relationship(s) between causation and correlation
  • Students will describe the benefits and challenges of parallel and distributed computing.

Teacher Resources

Lesson Plan

Session 1 - What is Big Data?

Getting Started (10 min) - Journal

Journal: How can a computer gather data from people? (Think-Pair-Share)

Remind students of the mind guessing game: http://en.akinator.com/ or 20 questions http://www.20q.net/

Discuss: How can the computer learn from people when playing one of these games? How many different answers do you think it could possibly know?

Teacher note: students are not expected to actually play this game during class.

  • Students should document in their journal the answer to this question: How does this game store all of the possible answers?
  • Ask 3 students to share their answers. (Possible strategies to select a random student: random.com, or pick a random student name stick from a cup.)
  • This activity should lead to today’s lesson on how large amounts of data are stored and then accessed as needed in a system.

Guided Activities (30 min) - Reading & Video

Read The Rise of Big Data in chunks: An Introduction to “Big Data” (20 mins) Reading can be found at: http://www.foreignaffairs.com/articles/139104/kenneth-neil-cukier-and-viktor-mayer-schoenberger/the-rise-of-big-data

  1. Break students into groups or pairs and jigsaw the seven units of the reading. Each group is to summarize their section in a tweet-sized comment (not more than 140 characters).
  2. Share tweets with the class.
  3. Explain to students that big data is impacting every area of life. By using more data and processing power we can make better decisions. As an illustration, show a clip from the movie Moneyball: (3 mins) https://www.youtube.com/watch?v=rMObWsKaIls
  4. After students watch, they create a journal entry explaining at least two ways data was used to better manage the baseball team. Partners discuss journal entries. Share at least one observation with table groups and then share at least one observation from each group with the class.
  5. Challenge students to think of evidence that would indicate that the size of a data set affects the amount of information that can be extracted from it. (prompt with counterexamples: if you wanted to predict best movies and only polled senior citizens; if you wanted to offer a new clothing style and only got data from people living in Alaska, etc. why is there more information available the bigger the data set gets?)

Show the first 3-5 minutes of this clip. (It becomes a bit dry, so just show the amount that is appropriate for your students to get the idea): https://www.youtube.com/watch?v=7D1CQ_LOizA

  • What are the 3 V's? List some details about each V.
  • (at 4:40) What is Hadoop and how is it used?
  • Identify appropriate data source and form questions
  • Extract data source into a format supported by underlying tools
  • Normalize data (remove redundancies, irrelevant details)
  • Make data uniform (change abbreviations, typos, capitalization inconsistencies, etc.)
  • Import data into a tool
  • Perform analysis
  • Visualize results

Some examples of how big data is used appropriately:

  • Netflix and Amazon use it to improve user recommendations
  • Dominos used it to determine that more people order pizza when it is raining so they now base some of their ad campaigns around weather patterns
  • Police use it to predict when and where crimes will appear

Some examples of how big data was inappropriately used:

  • In 2012 Target store's “outing” a teenager’s pregnancy
  • In 2012 Google spent 22.5 million on a settlement over allegations that they secretly tracked user’s web surfing
  • In 2012 Facebook spent 20 million to settle a lawsuit that alleged that they used user pictures without the user’s knowledge to endorse products that they “liked”
  • In 2013 the revelation of the NSA using big data for national security concerns

Wrap Up (10 min) – Group Review

  • Place students in groups of 4 where the first student is A, the next is B, etc. Each group creates a single sheet of paper with the letters across the bottom and the numbers 1 - 5 to demonstrate the level of understanding of each concept. Students should plot their understanding of each respective concept (see list below) on the graph. (The file "BigDataSampleDotGraph" in the lesson resources folder shows an example.)
  • Collect this graph as a level of student’s understanding of the concepts in the video.
  • Review each concept using the notes below.
    • Big data is kind of like drinking water from a fire hose. It's too much to process for a small pipeline…
    • A. The three “Vs”: Volume, Variety, and Velocity
    • B. Big Data processing steps:
    • C. Tools for processing big data:
      • Microsoft Excel (or some type of spreadsheet tool - i.e. Calc is another one)
      • Hadoop - a well known big data tool, provided by Apache, requires extensive programming knowledge to set up and use
      • SAS - provides a more intuitive interface and better graphical representations of data
      • Google Prediction API - takes advantage of machine learning to extract meaning from data
      • BitDeli - lightweight, easier to use version of Hadoop
    • D. Very few restrictions on the use of big data
      • Companies collect large amounts of data on their customers
      • Can be sold to other companies
      • Can be sold to the government
      • Can be used to “de-anonymize” someone

Homework

Students are to pick three topics they want to research that use big data. It is preferred that these topics relate to something learned this year in the course (e.g., the need for IPv6). Tomorrow, as the students enter class, they will sign up on a list with their chosen topic. Since the students will have three options, it is likely they will get one of their selected topics to research.

Session 2 - Where can Big Data be used?

Getting Started (5 min) - Journal

Journal: Think about your daily and weekly activities. What types of data are being stored about you?

Remind students to think about what they do online, in stores, while in a car, etc.

Guided Activities (10 min) – Processing Big Data

Review the steps to processing Big Data:

  1. Identify appropriate data source and form questions
  2. Extract data source into a format supported by underlying tools
  3. Normalize data (remove redundancies, irrelevant details, etc. as needed)
  4. Make data uniform (user-entered data may include abbreviations, spelling errors, or inconsistent capitalization without changing the meaning of the data) 
  5. Import data into a tool
  6. Perform analysis
  7. Visualize results

As a class, walk through these steps using the two files in the lesson resources folder (FailedBanklist.csv & Consumer_Complaints.csv)

Step 1.

Demonstrate how files such as these can be obtained at http://catalog.data.gov/dataset

Formulate questions such as:

  1. How does the size of a data set affect the amount of information that can be extracted from it?
  2. Are there any banks that are on both the complaint list and the failed bank list?
  3. Can we make some deductions about banks that may be on both lists? If so, what deductions can we make?

Step 2.

Extract data source into a format supported by underlying tools

Open one of these files in Notepad (or some simple editing program such as Notepad++) and demonstrate how the actual data itself is separated by commas, thus the file name “csv” for comma-separated value.

Open both files in Microsoft Excel. Complete a find for the bank name “Banco Popular de Puerto Rico” on both lists. You may want to first sort the data by bank name to find this bank or you can use CTRL + F to find the bank name (see screenshots below).

Steps 3 & 4.

Normalize data (remove redundancies, irrelevant details)

In this step, there is technically no need to remove redundancies or irrelevant details but you can show the students how you could remove data or limit the data to a particular data set. For example, if were to want to look at only the banks from Maryland, you can use the filter tool to only view those banks from MD.

Make data uniform (user-entered data may include abbreviations, spelling errors, or inconsistent capitalization without changing the meaning of the data) 

Cleaning data: Depending on how the data was collected, it may not be uniform. Therefore the data may need to be cleaned before it can be processed. Cleaning the data is the process that makes the data uniform without changing its meaning. For example, replacing all equivalent abbreviations, spellings, and capitalizations of the same word.

Step 5.

Import data into the tool

Right now the file type is as a csv file. By resaving the file as a .xlsx file it becomes a true spreadsheet file.

Step 6.

Perform analysis

We have determined that the bank “Banco Popular de Puerto Rico” is on both lists. Now ask the students “Why is this bank on both lists?” Note: On the Failed Bank list the Banco Popular de Puerto Rico is actually an acquiring institution. By looking more closely at the dates of the acquisition of the failed bank “Westernbank Puerto Rico” one can formulate some possible deductions that maybe the reason “Banco Popular de Puerto Rico” is on the complaint list is because they had recently taken over a failed bank. It could be possible that some of these complaints were related to this recent acquisition.

Step 7.

Visualize Results

Explain to students that they will learn more about visualizing their results in Unit 6. They can complete graph visualization in excel. Show them the website: http://www.gapminder.org/. Explain that even though a visualization in excel is not interactive like http://www.gapminder.org/, they can complete some form of visualizing their data by using a spreadsheet. Note: http://www.gapminder.org/ is VERY attention-grabbing. Only briefly show the students what they can do with it (see how data changes over time, look at many different data sets and download data in different forms - including csv and xlsx formats).

Independent Activity (30 min) – Online Research

Students should research their selected topics from homework. Some possible websites for finding data are listed above under “Possible good resource(s) for data collection.”

Students are to get your approval for a topic and then use the Big Data Sets Worksheet in the Lesson Resource Folder to find big data sets that are related to the approved topic.

Wrap Up (5 min) – Exit Slip

Students are to review using http://www.gapminder.org/ looking specifically at life expectancy. Students will write one question after “playing” the timeline of life expectancy using gapminder on an exit slip before leaving class. For example, one may write “Why is the life expectancy of countries such as Denmark, Sweden, & Norway typically higher than other countries throughout most of the timeline?”

 

Session 3 How can big data be processed?

Getting Started

 Challenge students to work in teams of four to find:

  • All of the US States Capital locations (latitude and longitude of each)
  • All of the Consumer Products Currently made by Apple
  • The Top Five Social media Websites in terms of daily users
  • The 10 most common baby names in each of the last five years 

After 2 min stop students and ask them to share how they approached the problem.

Say: You split the task into pieces, and each person worked at the same time to get the job done more quickly than would be possible by yourself. This is parallelism. In computing, parallelism can be defined as the use of multiple processing units working together to complete some task. There are many different kinds of hardware that can serve as a “processing unit”, but the principle is the same: a task is broken into pieces in some way, and the processing units cooperate on those pieces to get the task done. Basic of Processes with Python

Students launch a Python 2 IDE such repl.it or JDoodle and paste the following code adapted from the Basics of Processes with Python web page. 

from multiprocessing import *

def sayHi():

print "Hi from process", current_process().pid

def procEx():

print "Hi from process", current_process().pid, "(parent process)"

otherProcess1 = Process(target=sayHi, args=())

otherProcess1.start()

otherProcess2 = Process(target=sayHi, args=())

otherProcess2.start()

otherProcess3 = Process(target=sayHi, args=())

otherProcess3.start()

### execute

procEx()

 

Students execute the code - and debug if needed.

Review the definition of scalability. Ask how the scalability of systems is an important consideration when working with data sets, as the computational capacity of a system affects how data sets can be processed and stored.

Ask: What do the numbers mean in each line of output? 

Guided Activities 

Activity 1 Parallel Programming 

Explain: 

  1. Parallel processes do not always finish in the same order they started. 
    • The code above follows a common pattern: a parent process creates one or more child processes to do some task. In this example, suppose we call procEx. The first line in that function prints a simple message about what process is running. This is done by calling the function current_process that is defined in the multiprocessing module. The current_process function returns a Process object representing the currently running process. 
  2. Every Process object has a public field called "pid", which stands for “process identifier”.
    • Thus current_process().pid returns the pid for the currently running process. This is what gets printed. By calling the Process constructor, we created a Process object, but we have not yet started a new process. That is, the process exists but is not available to be run yet. The process is actually started with the last line of procEx. 
  3. There are two steps to make a child process do some task:
    • A Process object is created using the constructor, and then the Process object is started using the start method.  When more than one process is started the processes can run at the same time - in parallel. 
  4. All of our programs before today used sequential computing.  Sequential computing solutions take as much time to run as the sum of the time taken by each of their steps.  With parallel computing, the total time needed is a combination of a sequential part and just the longest time taken by each of the parallel parts. Show this Udacity video on the advantages of parallel processing. 

 

Discuss:

Use the questions below as prompts. Students share until the idea that parallel solutions scale more effectively is discussed.

Say:  Computer scientists use the term scaling to mean how a process responds to increases in the size of the project. 

  • How might a parallel computing solution make a process more scalable?
  • Why does a search engine such as Google would use parallel processing to rank the web pages in their database?

 

Activity 2 Distributed Programming 

Say:  In the first activity we saw an example of a program where more than one computer processor on a computer could be used at a time.  Sharing a task among the processors on many computers is an example of distributed problem-solving. Distributed computing shares components of a software system among multiple computers so a large task can be done in less time using the resources available to each of the computers - including both processing and storage resources.  Students read the first 5 paragraphs of the OpenScientist. 

Discuss:  With elbow partners, discuss what are the advantages of distributed computing and choose the top three projects that interest you from the list on the OpenScientist. 

  • Make a list of the 10 most interesting projects to the class. 
  • Ask: Why these projects are better solved using Distributed Computing rather than a single computer?
  • Students share until both time and storage constraints of using a single computer are discussed.

 

Activity 3 Speedup

Say: Parallel and distributed computing are powerful tools but they have their limits in terms of increasing efficiency.

Show this graph of Amdahl’s Law by Daniels220 at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=6678551.

Amdahl's Law

Ask:

  • At what number of processors, does the blue line stop indicating an increase in speed?
  • At what number of processors, does the red line stop indicating an increase in speed?
  • At what number of processors, does the purple line stop indicating an increase in speed?
  • At what number of processors, does the green line stop indicating an increase in speed?

Explain:  The “speedup” of a parallel solution is measured in the time it took to complete the task sequentially divided by the time it took to complete the task when done in parallel. The 4 different colored lines represent four different processes with increasing portions that can be done in parallel.  At first, they all speed up as more processors are used. After a time, adding more processors no longer speeds up each process.  The sequential parts of processing limit the overall time.

Discuss:  With elbow partners, students discuss why they think this is the case. Students share until these key points are verbalized:

  • a large solution consists of parallelizable parts and non-parallelized (sequential) parts 
  • at some point adding more parallel parts no longer adds to the efficiency of the solution


Wrap up:  

Students return to the OpenScientist and choose the project that most interests them.

Say: Parallel computing consists of a parallel portion and a sequential portion. 

Students identify a part of their chosen project that is done in serial and a part that is done in parallel.

 

 


Evidence of Learning

Formative Assessment

Students are to submit a document stating their topic for research using Big Data. This document should answer the questions:

Topic:

How is Big Data used to solve or remedy the topic?

Link(s) used to find Big Data? (i.e. data.gov, etc)


Summative Assessment

How has the transformation of data storage affected how data itself is used?

Answer: Storage and processing of large digital data enables us to analyze large data sets quickly rather than small sampling sizes as used before.

How can a computer use Big Data to make predictions?

Answer: Computers can use smart algorithms, powerful processors, and clever software to make inferences and predictions for solvable questions.

How can parallel processing help scale a solution to a Big Data problem?

Larger problems often have more parallelizable segments. The more processes that can be done in parallel the faster the overall process will become.

Lesson Summary

Summary

Big Data has been defined in many different ways. Easy access to large sets of data and the ability to analyze large data sets changes how people make decisions. Students will explore how Big Data can be used to solve real-world problems in their community. After watching a video that explains how Big Data is different from how we have analyzed and used data in the past, students will explore Big Data techniques in online simulations. Students will identify appropriate data source(s) and formulate solvable questions.

Outcomes

 

  • Students will explain how analyzing Big Data is different from the way ordinary data is analyzed.
  • Students will describe how computers can make predictions and answer questions through the use of Big Data, storage of data, and processing data.
  • Students will synthesize the relationship(s) between causation and correlation.

 

Overview

Session 1- What is Big Data?

 

  • Getting Started (10 min) - Journal
  • Guided Activities (30 min) – Reading and Video
  • Wrap Up (10 min) – Group Review
  • Homework

 

Session 2 – Where can big data be used?

 

  • Getting Started (5 min) - Journal
  • Guided Activities (15 min) – Processing Big Data
  • Independent Activities (25 min) – Online Research
  • Wrap Up (5 min) – Exit Slip

 

 

Learning Objectives

Essential Questions

  • How can computing extend traditional forms of human expression and experience?
  • How are vastly different kinds of data, physical phenomena, and mathematical concepts represented on a computer?
  • How can computation be employed to help people process data and information to gain insight and knowledge?
  • How can computation be employed to facilitate exploration and discovery when working with data?
  • What opportunities do large data sets provide for solving problems and creating knowledge?

Teacher Resources

Student computer usage for this lesson is: required

Reading assignment and video clips for Session 1: 

Possibly useful resource(s) for data collection:

Big Data Concepts:

Sample data sets (both acquired from http://catalog.data.gov/dataset) :

  • FailedBanklist.csv
  • Consumer_Complaints.csv

Lesson Plan

Session 1 - What is Big Data?

Getting Started (10 min) - Journal

Journal: How can a computer gather data from people ? (Think-Pair-Share)

Remind students of the mind guessing game: http://en.akinator.com/ or 20 questions http://www.20q.net/ 

Discuss: How can the computer learn from people when playing one of these games? How many different answers do you think it could possibly know? 

Teacher note: students are not expected to actually play this game during class.

  • Students should document in their journal the answer to this question: How does this game store all of the possible answers?
  • Ask 3 students to share their answers. (Possible strategies to select a random student: random.com, or pick a random student name stick from a cup.)
  • This activity should lead into today’s lesson on how large amounts of data are stored and then accessed as needed in a system.

Guided Activities (30 min) - Reading & Video

Read The Rise of Big Data in chunks: An Introduction to “Big Data” (20 mins) Reading can be found at: http://www.foreignaffairs.com/articles/139104/kenneth-neil-cukier-and-viktor-mayer-schoenberger/the-rise-of-big-data


Break students into groups or pairs and jigsaw the seven units of the reading. Each group is to summarize their section in a tweet sized comment (not more than 140 characters).


Share tweets with the class.


Explain to students that big data is impacting every area of life. By using more data and processing power we can make better decisions. As an illustration, show a clip from the movie Moneyball: (3 mins) https://www.youtube.com/watch?v=rMObWsKaIls
After students watch, they create a journal entries explaining at least two ways data was used to better manage the baseball team. Partners discuss journal entries. Share at least one observation with table groups and then share at least one observation from each group with the class.

Show the first 3-5 minutes of this clip. (It becomes a bit dry, so just show the amount that is appropriate for your students to get the idea): https://www.youtube.com/watch?v=7D1CQ_LOizA

 

  • What are the 3 V's? List some details about each V.
  • (at 4:40) What is Hadoop and how is it used?
  • Identify appropriate data source and form questions
  • Extract data source into format supported by underlying tools
  • Normalize data (remove redundancies, irrelevant details)
  • Import data into tool
  • Perform analysis
  • Visualize results

 

 

Some other concepts to point out to students if there is time:

Some examples of how big data is used:

 

  • Netflix and Amazon use it to improve user recommendations
  • Dominos used it to determine that more people order pizza when it is raining so they now base some of their ad campaigns around weather patterns
  • Help police predict when and where crimes will appear

 

Some examples of how big data was inappropriately used:

 

  • In 2012 Target store's “outing” a teenager’s pregnancy
  • In 2012 Google spent 22.5 million on a settlement over allegations that they secretly tracked user’s web surfing
  • In 2012 Facebook spent 20 million to settle a lawsuit that alleged that they used user pictures without the user’s knowledge to endorse products that they “liked”
  • In 2013 the revelation of the NSA using big data for national security concerns

 

 

 

Wrap Up (10 min) – Group Review

  • Place students in groups of 4 where the first student is A, the next is B, etc. Each group creates a single sheet of paper with the letters across the bottom and the numbers 1 - 5 to demonstrate the level of understanding of each concept. Students should plot their understanding for each respective concept (see list below) on the graph. (The file "BigDataSampleDotGraph" in the lesson resources folder shows an example.)
  • Collect this graph as a level of student’s understanding of the concepts in the video.
  • Review each concept using the notes below.
    • Big data is kind of like drinking water from a fire hose. It's too much to process for a small pipeline…
    • A. The three “Vs”: Volume, Variety, and Velocity
    • B. Big Data processing steps:
    • C. Tools for processing big data:
      • Microsoft Excel (or some type of spreadsheet tool - i.e. Calc is another one)
      • Hadoop - a well known big data tool, provided by Apache, requires extensive programming knowledge to set up and use
      • SAS - provides a more intuitive interface and better graphical representations of data
      • Google Prediction API - takes advantage of machine learning to extract meaning from data
      • BitDeli - lightweight, easier to use version of Hadoop
    • D. Very few restrictions on use of big data
      • Companies collect large amounts of data on their customers
      • Can be sold to other companies
      • Can be sold to the government
      • Can be used to “de-anonymize” someone

Homework

Students are to pick three topics they want to research that use big data. It is preferred that these topics relate to something learned this year in the course (e.g., the need for IPv6). Tomorrow, as the students enter class, they will sign up on a list with their chosen topic. Since the students will have three options, it is likely they will get one of their selected topics to research.

Session 2 - Where can Big Data be used?

Getting Started (5 min) - Journal

Journal: Think about you daily and weekly activities. What types of data are being stored about you? What information (a collection of facts and patterns) is extracted from the data?

Remind students to think about what they do online, in stores, while in a car, etc.  

Guided Activities (10 min) – Processing Big Data

Review the steps to processing Big Data:

  1. Identify appropriate data source and form questions
  2. Extract data source into a format supported by underlying tools
  3. Normalize data (remove redundancies, irrelevant details)
  4. Import data into tool
  5. Perform analysis - create information from the data
  6. Visualize results

As a class, walk through these steps using the two files in the lesson resources folder (FailedBanklist.csv & Consumer_Complaints.csv)

Step 1.

Demonstrate how files such as these can be obtained at http://catalog.data.gov/dataset

Formulate questions such as:
Are there any banks that are on both the complaint list and the failed bank list?
Can we make some deductions about banks that may be on both lists? If so, what deductions can we make?

Step 2.

Extract data source into a format supported by underlying tools

Open one of these files in Notepad (or some simple editing program such as Notepad++) and demonstrate how the actual data itself is separated by commas, thus the file name “csv” for comma separated value.

Open both files in Microsoft Excel. Complete a find for the bank name “Banco Popular de Puerto Rico” on both lists. You may want to first sort the data by bank name to find this bank or you can use CTRL + F to find the bank name (see screenshots below).

Step 3.

Normalize data (remove redundancies, irrelevant details)

In this step, there is technically no need to remove redundancies or irrelevant details but you can show the students how you could remove data or limit the data to a particular data set. For example, if were to want to look at only the banks from Maryland, you can use the filter tool to only view those banks from MD.

Step 4.

Import data into tool

Right now the file type is as a csv file. By resaving the file as a .xlsx file it becomes a true spreadsheet file. 

Step 5.

Perform analysis

We have determined that the bank “Banco Popular de Puerto Rico” is on both lists. Now ask the students “Why is this bank on both lists?” Note: On the Failed Bank list the Banco Popular de Puerto Rico is actually an acquiring institution. By looking more closely at the dates of the acquisition of the failed bank “Westernbank Puerto Rico” one can formulate some possible deductions that maybe the reason “Banco Popular de Puerto Rico” is on the complaint list is because they had recently taken over a failed bank. It could be possible that some of these complaints were related to this recent acquisition. Is there any other information that can be gathered? What opportunities for identifying trends, making connections, and addressing problems does the data provide?

Step 6. 

Visualize Results

Explain to students that they will learn more about visualizing their results in Unit 6. They can complete graph visualization in excel. Show them the website: http://www.gapminder.org/. Explain that even though a visualization in excel is not interactive like http://www.gapminder.org/, they can complete some form of visualizing their data by using a spreadsheet. Note: http://www.gapminder.org/ is VERY attention-grabbing. Only briefly show the students what they can do with it (see how data changes over time, look at many different data sets, and download data in different forms - including csv and xlsx formats).

Independent Activity (30 min) – Online Research

Students should research their selected topics from homework. Some possible websites for finding data are listed above under “Possible good resource(s) for data collection.”

Students are to get your approval for a topic and then use the Big Data Sets Worksheet in the Lesson Resource Folder to find big data sets that are related to the approved topic. 

Wrap Up (5 min) – Exit Slip

Students are to review using http://www.gapminder.org/ looking specifically at life expectancy. Students will write one question after “playing” the timeline of life expectancy using gapminder on an exit slip before leaving class. For example, one may write “Why is the life expectancy of countries such as Denmark, Sweden, & Norway typically higher than other countries throughout most of the timeline?”


Evidence of Learning

Formative Assessment

Students are to submit a document stating their topic for research using Big Data. This document should answer the questions:

Topic:
How is Big Data used to solve or remedy the topic?
Link(s) used to find Big Data? (i.e. data.gov, etc)


Summative Assessment

How has the transformation of data storage affected how data itself is used? 

Answer: Storage and processing of large digital data enables us to analyze large data sets quickly rather than small sampling sizes as used before. 

How can a computer use Big Data to make predictions? 

Answer: Computers can use smart algorithms, powerful processors, and clever software to make inferences and predictions for solvable questions.

Lesson Summary

Pre-Lesson Preparation

  • Teachers will need to have a piece of paper with a unique number on it for all but one student in the class.
  • Students will need access to the datasets and Python skeleton code in the Lesson Resources folder.
  • Teachers will need to print out the "Search Comparison Worksheet" for each student.

Summary

Students investigate data organization, simulate linear and binary searches, and write pseudocode and Python for linear and binary search methods.

Outcomes

  • Students will demonstrate how combining data sources and classifying data are part processing data
  • Students will describe challenges to structuring large data sets for analysis
  • Students will identify how the order of data influences which methods are appropriate for searching the data.
  • Students will describe standard search algorithms in pseudocode and in Python.
  • Students will Compare different algorithms for efficiency when searching for an item.

Overview

Session 1

  1. Getting Started (5 min) - Journal
  2. Class Discussion and Activities (25 min) - Introduction to linear and binary search algorithms
  3. Coding Linear Search (20 min)

Session 2

  1. Getting Started (5 min) - Think-Pair-Share 
  2. Coding Activity (30 min) - Write pseudocode and implement binary search
  3. Compare Searches (15 min) - Fill out worksheet to compare search algorithms

Source

Phone book presentation adapted from a lesson taught by Dr. Rheingans in CMSC 201 at the University of Maryland, Baltimore County

 

Learning Objectives

CSP Objectives

  • EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
    • LO CRD-2.C - Identify input(s) to a program.
    • LO CRD-2.F - Design a program and its user interface.
    • LO CRD-2.J - Identify inputs and corresponding expected outputs or behaviors that can be used to check the correctness of an algorithm or program.
  • EU DAT-2 - Programs can be used to process data, which allows users to discover information and create new knowledge.
    • LO DAT-2.A - Describe what information can be extracted from data.
    • LO DAT-2.C: - Identify the challenges associated with processing data.
    • LO DAT-2.E - Explain how programs can be used to gain insight and knowledge from data.
  • EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
    • LO AAP-2.A - Express an algorithm that uses sequencing without using a programming language.
    • LO AAP-2.L - Compare multiple algorithms to determine if they yield the same side effect or result.
    • LO AAP-2.M - For algorithms: a. Create algorithms. b. Combine and modify existing algorithms.
    • LO AAP-2.O - For algorithms involving elements of a list: a. Write iteration statements to traverse a list. b. Determine the result of an algorithm that includes list traversals.
    • LO AAP-2.P - For binary search algorithms: a. Determine the number of iterations required to find a value in a data set. b. Explain the requirements necessary to complete a binary search.
  • EU AAP-3 - Programmers break down problems into smaller and more manageable pieces. By creating procedures and leveraging parameters, programmers generalize processes that can be reused. Procedures allow programmers to draw upon existing code that has already been tested, allowing them to write programs more quickly and with more confidence.
    • LO AAP-3.A - For procedure calls: a. Write statements to call procedures. b. Determine the result or effect of a procedure call.
  • EU AAP-4 - There exist problems that computers cannot solve, and even when a computer can solve a problem, it may not be able to do so in a reasonable amount of time.
    • LO AAP-4.A - For determining the efficiency of an algorithm: a. Explain the difference between algorithms that run in reasonable time and those that do not. b. Identify situations where a heuristic solution may be more appropriate.

Math Common Core Practice:

  • MP1: Make sense of problems and persevere in solving them.
  • MP2: Reason abstractly and quantitatively.
  • MP6: Attend to precision.
  • MP7: Look for and make use of structure.

Common Core Math:

  • F-BF.1-2: Build a function that models a relationship between two quantities

Common Core ELA:

  • RST 12.3 - Precisely follow a complex multistep procedure

NGSS Practices:

  • 2. Developing and using models
  • 3. Planning and carrying out investigations
  • 5. Using mathematics and computational thinking

Key Concepts

Students will:

  • Demonstrate how combining data sources and classifying data are part processing data
  • Describe challenges to structuring large data sets for analysis
  • Be able to identify how the order of data influences which methods are appropriate for searching the data.
  • Be able to describe standard search algorithms in pseudocode and in Python.
  • Compare different algorithms for efficiency when searching for an item.

Essential Questions

  • How does abstraction help us in writing programs, creating computational artifacts and solving problems?
  • How can computational models and simulations help generate new understanding and knowledge?
  • How can computation be employed to help people process data and information to gain insight and knowledge?
  • What considerations and trade-offs arise in the computational manipulation of data?
  • What opportunities do large data sets provide for solving problems and creating knowledge?
  • How are algorithms implemented and executed on computers and computational devices?
  • What kinds of problems are easy, what kinds are difficult, and what kinds are impossible to solve algorithmically?
  • How are algorithms evaluated?
  • How do computer programs implement algorithms?
  • How does abstraction make the development of computer programs possible?
  • Which mathematical and logical concepts are fundamental to computer programming?

Teacher Resources

Student computer usage for this lesson is: required

In Lesson Resources Folder:

  • Search Comparison Worksheet to use at the end of Session 2
  • DataSets folder that contains the six datasets for the Search Comparison Worksheet
  • SearchCode.py contains skeleton code for a numeric search program
  • Binary Search in LISP example to show on the screens.

Lesson Plan

Session 1

Getting Started (5 min)

Journal: What would be the best way to organize a collection of DVDs so that you could find the one you want very quickly? Would you need a different method for a radio station with thousands of DVDs? Discuss.

Class Discussion and Activities (25 min)

Linear Search Discussion (5 min)

As a class, discuss the following questions:
What if you are looking for a specific song. What is the most effective way to look for something if it is in an unordered, unsorted collection?
What are some challenges to organizing large sets of information so they can be processed?
Why does it help to classify data when you are trying to sort it or search through it?

Possible Answers and suggestions for discussion:

  • You can search Randomly (How do you stop yourself from repeating yourself?)
  • You can search One-by-one, marking off which ones have already been checked.
  • You can divide up the problem and each search a pile (multiprocessing)
  • Challenges: data entry is time consuming. Large sets of data are more important to keep organized but if you combine information from multiple places they might be organized differently so you have to come up with a single system.
  • If you don’t classify the information you’re sorting or searching such as the album title, song title or other searchable information, it would be very confusing to try to find what you are looking for, especially if you don’t know exactly what you are trying to find.
  • Having things grouped by artist, genre, or other categories makes it easier to narrow down what you're looking for.

Linear Search Activity (5 min)

  1. Pass out pieces of paper with numbers on them to everyone in the class except one person. Students should not share their numbers with anyone.
  2. Have everyone stand at the front of the room in a line. The person without a number stands in front and is assigned a number to look for. The class should keep track of how many people they have to ask before they get the number (students that have been checked should sit down).
    • Do this activity a few times. In between, each student should switch with another student a couple of times without showing their paper. (Note: A fun way to do this might be a snowball fight if you can)
    • Ask for a number that does not exist in the set at least once. What happens? How many people does it take before they figure out the number is not in the set?
  3. Have everyone sit down but keep their papers.

Binary Search Discussion and Presentation (10 min)

Take out a dictionary (or phone book). Ask the class, how would you search for a particular word/name?

Steps for Binary Search in a book of items to demonstrate to the class:

  1. Flip to the middle and pick a word in the middle of the page
  2. Is your word higher or lower than this word? If it is higher, “throw out” the lower half of the book. If it is lower, “throw out” the top half. (Not literally unless it is a very old phonebook. Students do love it when you tear up the phonebook, though, and it makes for a very effective demonstration.)
  3. Repeat steps 1 and 2 until you find the word.

Why would this not work for an unordered list?

Binary Search Activity (5 min)

  1. Have everyone go to the front of the room and get in numerical order by paper. Repeat the search activity using the binary search algorithm.
  2. Try with a number not in the list. How can you tell when it doesn’t exist?

 

Coding Linear Search (20 min)

  1. Have students work in pairs to write a code for linear search. The code should:
    1. Read in a csv file that has a list of unordered numbers. (They should input the file name from the user.)
    2. Ask the user what number they want to find and validate that number.
    3. Tell the user whether the number was found. If it was, it should output the number of items it had to look at.
  2. Students should save their code for the next day.

Note: Skeleton code (SearchCode.py) is provided in the Lesson Resources Folder.

Session 2

Getting Started (5 min) 

Think-Pair-Share

  • Ask students to list non-numbered, real-world things that they search for or sort/order in their daily lives.
  • Can all data be sorted, or do types of data exist that cannot be sorted? How would you organize and search these types of data?
  • Is there always a "correct" solution when sorting data?

Coding Activity (30 min)

Part 1 Pseudocode (10 min)

  1. As a class, write the major steps for Binary Search on the board.
  2. Identify algorithms that could be combine to make a binary search algorithm.
  3. In pairs, have the students pseudocode a binary search algorithm.
  4. Discuss the advantages of using established algorithms.
  5. When finished, display binary search code in other languages. (Scratch https://scratch.mit.edu/projects/23163175/; javascript https://www.khanacademy.org/computing/computer-science/algorithms/binary-search/a/implementing-binary-search-of-an-array ; LISP (see handout)

Part 2 Coding (20 min)

  1. Students should use their pseudocode to write a program for binary search. (It would be useful to build on the skeleton code provided for linear search.) The code should accomplish the same things as linear search: read in a file, get a number, and output if the number is found and how many items were checked.

Guidance for Practice Questions - Question Set 20

Questions in the AP Classroom Question Bank may be used for summative purposes.  

Sixty of the 80 questions are restricted to teacher access.  The remaining 20 questions are from public resources.  

Questions are identified by their initial phrases.

A code segment will be used to swap the values of

A programmer is deciding between using a linear...

A programmer wrote the code segment below to di...

An algorithm will be used to identify the maximum

In the procedure Mystery below, the parameter n...

The program segment below is intended to move a...

The question below uses a robot in a grid of sq...

There are 32 students standing in a classroom. ...

Comparing Searches (15 min) – (May also be Homework if programs are not finished)

  • Pass out the worksheet "Search Comparison Worksheet" from the lesson resources folder.
  • Students should run both their linear and binary search programs with the six provided datasets of increasing sizes, (also in the lesson resources folder.)
  • As they go through, students should record their results in the worksheet and answer the questions at the bottom.
  • Discuss the results as a class.

Options for Differentiated Instruction

For students that have difficulty understanding the concepts of searching for items in a set of data, pair those students with a student who has a firm grasp of the concept for the activities. Have the pair work together for 1A and then have them keep their own paper secure using the extra game sheet (1A').  Simiilarly for 1B - 1B' and 1C - 1C'.


Evidence of Learning

Formative Assessment

Correctness of Python functions for linear search and binary search


Summative Assessment

"Searching Assessment Items.docx" in lesson folder

"Search Comparison Worksheet" in the lesson folder

Lesson Summary

Summary

In this three-session lesson, students explore and confront the difficulties of the problem of sorting data and the difficulties involved in expressing a clear and efficient algorithm for sorting. 

Outcomes

  • Students will be able to relate a real-world task such as sorting cards to sorting/organizing information in a computer.
  • Students will understand the problem of sorting and why it is nontrivial for large data sets.
  • Students will be able to describe in pseudocode simple sorting algorithms (such as bubblesort).
  • Students will be able to reason about the correctness and efficiency of different sorting algorithms, and will understand that the time required to sort a data set increases as the size of the data set grows.

Overview

Session 1:

  1. Getting Started (5 min) - Journal
  2. Activity Card Sorting (40 min)
    1. Explain the Problem [10 min]
    2. Paired Activity: Algorithm Creation [30 min]
  3. Wrap Up (5 min)

Session 2:

  1. Getting Started (5 min) - Journal
  2. Group Activity: Algorithm Evaluation (20 min)
  3. Group Activity: Algorithm Selection and Justification (20 min)
  4. Wrap Up (5 min)

Session 3:

  1. Getting Started (10 min) - Journal and short discussion
  2. Guided Activity: Algorithm Analysis (10 min)
  3. Paired Activity: Algorithm Analysis (20 min)
  4. Wrap Up (10 min)

Learning Objectives

CSP Objectives

  • EU CRD-1 - Incorporating multiple perspectives through collaboration improves computing innovations as they are developed.
    • LO CRD-1.A - Explain how computing innovations are improved through collaboration.
    • LO CRD-1.C - Demonstrate effective interpersonal skills during collaboration.
  • EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
    • LO CRD-2.E - Develop a program using a development process.
  • EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
    • LO AAP-2.L - Compare multiple algorithms to determine if they yield the same side effect or result.
  • EU AAP-4 - There exist problems that computers cannot solve, and even when a computer can solve a problem, it may not be able to do so in a reasonable amount of time.
    • LO AAP-4.A - For determining the efficiency of an algorithm: a. Explain the difference between algorithms that run in reasonable time and those that do not. b. Identify situations where a heuristic solution may be more appropriate.

Math Common Core Practice:

  • MP2: Reason abstractly and quantitatively.
  • MP4: Model with mathematics.
  • MP8: Look for and express regularity in repeated reasoning.

Key Concepts

The algorithmic techniques and analysis involved in sorting data are seen in a wide variety of contexts and applications. Sorting numbers in a list is challenging but foundational to many algorithms in computer science. 


Essential Questions

  • How can computation be employed to facilitate exploration and discovery when working with data?
  • What considerations and trade-offs arise in the computational manipulation of data?
  • Why are some languages better than others when used to implement algorithms?
  • What kinds of problems are easy, what kinds are difficult, and what kinds are impossible to solve algorithmically?
  • How are algorithms evaluated?
  • Which mathematical and logical concepts are fundamental to computer programming?

What makes a "good" algorithm?

What should be taken into consideration when comparing algorithms that complete the same task?

Teacher Resources

Student computer usage for this lesson is: optional

  • You will need enough playing cards for every pair of students to have eight different cards. Alternatively, you may make your own cards (e.g., using index cards) with different numbers on them in place of playing cards.
  • Handout - Sorting Algorithm Evaluation Student Worksheet.docx (in lesson folder)
  • Video collection - https://www.youtube.com/user/AlgoRythmics/videos
    • Note: If students do not have access to computers to individually watch the sorting videos during the paired activity in Session 3, you could instead choose one of the algorithms and show it to the class, having all pairs complete the algorithm evaluation handout for that selected method.

 

Lesson Plan

Session 1

Getting Started (5 min)

Journal: Have students respond to the following questions:

  • If you had 1 million books, and you had to be able to find any book by its title as fast as possible, how would you organize them? 
  • How many books would you need to look at in the worst case scenario to find the title before you have organized the books?
  • How many books would you need to look at in the worst case scenario to find the title after you have organized the books?

Teacher note: Having just finished the lessons on searching, students should recall that searching an ordered list allows using the binary search which is faster than searching an unordered list with a linear or random search.

Activity: Card Sorting (40 min)

Teacher note: The focus should be directed more toward the problem-solving technique than nitpicking about the language used. Although students are writing instructions for a human to manipulate a set of playing cards, they still need to be precise, because the assumption is that the person doesn't know what they are doing. This problem is challenging and will require creativity.

Explain the Problem [10 min]

  • Demonstrate the card sorting sorting task as you explain.
  • Clarify the goal: Today, you and a partner are going to design an algorithm and list the instructions for a person to arrange a row of playing cards into order (from lowest to highest value).
  • Explain the basic rules:
    • If a card is on the table, it must be face down.
    • You can only see the value of a card by picking it up and looking at its face.
    • You can only be holding and looking at two cards at a time (1 in each hand).
    • You can compare the values of any of the cards you are holding in your hands and determine if one is greater, less than, or equal to the other card.
    • When you put a card down, try to be clear about where it should be put back down.  Cards should be put face down.
    • You cannot use your memory of face-down cards to make decisions about them. You should behave as though you have no recollection of cards that you aren't currently holding.
    • You will have eight cards to practice, but the procedure you follow should be general enough to work for any number of cards.
  • Ask the students whether there are any questions about the rules.
  • Suggest that it could be helpful to break the task down into parts. Using abstraction and collaboration can decrease the size and complexity of the task that each programmer has to solve. Abstraction allows you to build upon existing processes, collaboration allows you to break up the task once you agree on what the strategy is and what the separate tasks are that need to be solved. Example, cards need to be comared and swapped, one person could write the specific steps for how that should be done.

  • Emphasize to students that there are *many* ways to achieve this task. Be creative. As a class, they should try to come up with as many different ways of sorting as possible.

Paired Activity: Algorithm Creation [30 min]

  • If there is an odd number of students, there could be one group of three students.
  • Distribute the cards to student pairs. The cards can be ordinary playing cards, but each pair should receive cards from the same suit that have been shuffled.  Alternatively, you can use handmade cards with arbitrary numbers on them. Before starting, have students agree on the ordering of the cards (e.g., whether aces are high or low). 
  • Have students write their instruction list (algorithm) on a piece of paper.
  • The format of the instruction lists is up to the students; they can create a numbered list, a flow chart, a diagram with text and arrows, or other means of communication.
  • Students should be working productively for the rest of the class: designing and writing their card sorting algorithm.
  • Circulate around the room to make sure that students are on task and that they understand the rules and goals of the activity.

Wrap Up (5 min)

Journal: 

  • If you were to give your algorithm a name that describes how it sorts, what would you name it?
  • Identify the most difficult part of writing down the instructions for your algorithm.

Homework:  Any pairs that did not finish the activity should complete it as a homework assignment before the next session.

Session 2

In this session, students review the sorting algorithms they wrote in session 1. Students will follow the algorithms created by their classmates and discover a variety of sorting strategies. By analyzing the various algorithms, students will attempt to find the "best" sorting strategy.

Teacher note: There are two main difficulties in algorithm design to highlight: (1) It is very difficult to be precise with language without some agreement about what terms mean. (2) Solving the problem by determining the strategies and steps required to sort objects correctly, as well as efficiently, presents a second level of difficulty.

Getting Started (5 min)

Journal: How do you think a sorting algorithm should be "measured" to determine if it is the "best"? Ask for ideas and discuss this during the group activity.

 

Group Activity: Algorithm Evaluation (20 min)

  • Distribute the "Sorting Algorithm Evaluation Student Worksheet.doc" (in the lesson folder).
  • Discuss: How will you determine which algorithm is better and why? How do you define “better”? What criteria will you use?Give students an opportunity to respond individually and collectively. Optionally, you may use a think-pair-share approach or small groups to develop ideas and then share with the class. Solicit student ideas and try to agree of 4 criteria to write down on page 2 of the student worksheet.
    1. Does it work correctly? (correctness)
    2. Is it well written and easy to follow?
    3. Is it efficient in terms of time? Is it efficient only under certain conditions?
    4. Is it efficient in use of space? Do you need a lot of extra table room as temporary places to arrange the cards in the process of sorting?
  • Re-distribute playing cards to student pairs.
  • Instruct students to follow directions on the Sorting Algorithm Evaluation worksheet and use it to record their experiences.
  • At the end of the session be sure to point out:
    • An efficient algorithm for a problem can help solve larger instances of the problem.
    • What looks more complex in code may actually be a more efficient solution.
    • Different correct algorithms for the same problem can have different efficiencies.
    • An advantage of doing this collaboratively is that it can decrease the size and complexity of the task that each programmer has to solve.

Group Activity: Algorithm Selection and Justification (20 min)

Create groups of four by joining the pairs who previously exchanged algorithms.  

Teacher note: This "swapping algorithm" activity works especially well when students exchange algorithms with a group that has a fundamentally different approach. However, as a practical matter, this can be hard to arrange. From your observations during Session 1, you might have a sense of groups with different approaches that you can assign to swap algorithms.

  • Groups should give feedback to each other about the algorithm, making sure to discuss:
    • The algorithm's correctness - does it work; do you understand it; could you simulate it or act it out?
    • Explain confusing/ambiguous parts of the other group's algorithm in order to help them write it better.
    • Discuss which of the two algorithms is "better" and be able to explain why.
    • Each group of 4 will nominate one of the two algorithms as the better one of the group.
  • Ask each group to volunteer the better algorithm at their table.
  • Act out/simulate one group's instructions.

Teacher note: It is possible that the nominated algorithm won't work perfectly. If you encounter any problems with the directions, give them the benefit of the doubt and simulate it as best you can to enable the class to understand the intent.

  • Ask if other groups solved the problem with a different strategy and demonstrate a few groups' algorithms.
  • Engage students in a discussion about which algorithm was the best. Why is it best? How should algorithms be evaluated?
  • Ask students to argue for one method or another and explain their reasons.

Teacher note: The students will perform an actual analysis in a later lesson, so it's okay at this point to simply guide the discussion to see how students are thinking. They will re-examine these ideas later.

  • Point out to the students that the speed at which the actor can follow the algorithm is not a measure of the algorithm, but rather the speed of the person. As an analogy, you might compare a person's walking speed with the distance they have to travel. To compare algorithms, you need to measure some "units of work". What those units are is debatable, but must to be agreed upon. For card sorting, "work" could mean picking a card up, putting it down, comparing with another card, etc.

Wrap Up (5 min)

Remind students of the two main issues in writing effective algorithms:

  • The need for a clear, unambiguous language for expressing algorithmic solutions.
  • Defining criteria for determining whether an algorithm is "good."

Homework:  Assign students to write a final version of their algorithm, working out any ambiguities or other problems revealed during the activities.

Session 3

In this session, we end the set of sorting activities by relating sorting to algorithms in the real world. A further exploration of algorithm analysis with some new algorithms will sharpen their intuition about what should and shouldn't be "counted" when analyzing algorithms, what is "hard" for a computer to do, or what takes a "long time."

Getting Started (10 min)

Journal: discuss ideas and elements from the previous lesson:

  • What is "work" for a computer?
  • Why would the efficiency of an algorithm be measured in terms of both speed and amount of "space" (memory) required?
  • Why does it matter how an algorithm (or program) will be used to decide if it is an appropriate solution?

Teacher should clarify that in general, we want these things from an algorithm:

  • To provide a correct solution for any given input.
  • To use computational resources as efficiently as possible.
  • The suitability of a solution depends on how it will be used. Some sorting algorithms work best in terms of time or memory used, others excel at particular situations like sorting a list that is already mostly sorted but work poorly on lists that are badly out of order.

Guided Activity:  Algorithm Analysis (10 min)

Paired Activity: Algorithm Analysis (20 min)

  • Assign each pair a different sorting algorithm.
  • Have each pair watch their assigned sorting video and complete the Sorting Algorithm Evaluation for that method.
  • Have each group attempt to write pseudocode for their assigned sorting algorithm.  Some of the algorithms are quite complex, so emphasize to the students that the goal of this exercise should be to think about what's happening, rather than to get it completely "right." 
  • Compare this algorithm to the one you developed.
  • What basic steps do the algorithms have in common? How can abstraction make it easier to solve similar problems by building on existing solutions?

Wrap Up (10 min)

  • Instruct students to discuss the following prompts with an elbow partner and then, collaboratively write a response in their journals that incorporates the ideas of both partners.
    • What should and shouldn't be "counted" when analyzing algorithms?
    • What is "hard" or time consuming for a computer to do?
    • Why is the efficiency of algorithms important?
  • Assign homework for next lesson on comparing algorithms (Unit 5, Lesson 4): Identify in your journal two places that you often travel between. Of the alternative routes available, what do you consider to be the best route? Why? Are there circumstances in which an alternate route is better? When is that the case?

     

 


Options for Differentiated Instruction

Suggestion: If you have a mix of new and advanced students, challenge the advanced students to sort twice as many cards with a parallel processing algorithm of their own design. Each student on the team can perform one action at the same time. 


Evidence of Learning

Formative Assessment

Evaluation of algorithms

Convert actions into an algorithm


Lesson Summary

Pre-lesson Preparation: You should familiarize yourself with www.sorting-algorithms.com paying particular attention to the variety of algorithms and settings along the top of the page. For session 2, you should have the timedsorts.py code and data files (in the lesson folder) readily available for your students.

Summary

In this two-session lesson, students will explore algorithmic efficiency. They will understand the idea through discussion, manual analysis of simple algorithms, and data collection for implemented algorithms.

Outcomes

Students will be able to:

  • identify algorithms that have different efficiencies in their problem solving approach.
  • explain the metrics used to describe efficiency.
  • perform an empirical analysis of sorting algorithms by running the algorithms on different inputs.

Overview

Session 1:

  1. Getting Started (5 min) 
  2. Guided Activity (40 min)
    1. Good Algorithms and Better Algorithms (5 min)
    2. Algorithmic Efficiency (10 min)
    3. Computational Complexity (10 min)
    4. Comparing Sorting Algorithms (15 min)
  3. Wrap Up (5 min)

Session 2:

  1. Getting Started (5 min)
  2. Empirical Investigation (40 min)
    1. Introduction (5 min)
    2. Experimental Design (10 min)
    3. Data Collection (25 min)
  3. Wrap Up (5 min)

Learning Objectives

CSP Objectives

  • EU CRD-1 - Incorporating multiple perspectives through collaboration improves computing innovations as they are developed.
    • LO CRD-1.A - Explain how computing innovations are improved through collaboration.
  • EU CRD-2 - Developers create and innovate using an iterative design process that is user-focused, that incorporates implementation/feedback cycles, and that leaves ample room for experimentation and risk-taking.
    • LO CRD-2.C - Identify input(s) to a program.
  • EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
    • LO AAP-2.A - Express an algorithm that uses sequencing without using a programming language.
    • LO AAP-2.B - Represent a step-by-step algorithmic process using sequential code statements.
    • LO AAP-2.L - Compare multiple algorithms to determine if they yield the same side effect or result.
  • EU AAP-3 - Programmers break down problems into smaller and more manageable pieces. By creating procedures and leveraging parameters, programmers generalize processes that can be reused. Procedures allow programmers to draw upon existing code that has already been tested, allowing them to write programs more quickly and with more confidence.
    • LO AAP-3.D - Select appropriate libraries or existing code segments to use in creating new programs.
    • LO AAP-3.F - For simulations: a. Explain how computers can be used to represent real-world phenomena or outcomes. b. Compare simulations with real-world contexts.
  • EU AAP-4 - There exist problems that computers cannot solve, and even when a computer can solve a problem, it may not be able to do so in a reasonable amount of time.
    • LO AAP-4.A - For determining the efficiency of an algorithm: a. Explain the difference between algorithms that run in reasonable time and those that do not. b. Identify situations where a heuristic solution may be more appropriate.

Math Common Core Practice:

  • MP5: Use appropriate tools strategically.

Common Core Math:

  • S-ID.1-4: Summarize, represent, and interpret data on a single count or measurement variable
  • S-ID.5-6: Summarize, represent, and interpret data on two categorical and quantitative variables

Common Core ELA:

  • WHST 12.1 - Write arguments on discipline specific content
  • WHST 12.4 - Produce clear and coherent writing in which the development, organization, and style are appropriate to task, purpose, and audience
  • WHST 12.6 - Use technology, including the Internet, to produce, publish, and update writing products
  • WHST 12.7 - Conduct short as well as more sustained research projects to answer a question

NGSS Practices:

  • 1. Asking questions (for science) and defining problems (for engineering)
  • 2. Developing and using models
  • 3. Planning and carrying out investigations
  • 4. Analyzing and interpreting data
  • 8. Obtaining, evaluation, and communicating information

Essential Questions

  • How does abstraction help us in writing programs, creating computational artifacts and solving problems?
  • How can computational models and simulations help generate new understanding and knowledge?
  • What kinds of problems are easy, what kinds are difficult, and what kinds are impossible to solve algorithmically?
  • How are algorithms evaluated?

Teacher Resources

Student computer usage for this lesson is: required

Sorting:

Lesson Plan

Session 1

Getting Started (5 min)

Think-Pair-Share: Alternate Routes

  • If you assigned the homework from the previous lesson, ask your students to get out their journals to discuss their entries. If not, you could have them write a response in their journal to the following prompt:
    • Identify two places that you often travel between. Of the alternative routes available, what do you consider to be the best route? Why? Are there circumstances in which an alternate route is better? When is that the case?
  • Have your students pair off to discuss their responses for a minute or two.
  • Ask some of the pairs to share and summarize their journal entries. 

 

Guided Activity (40 min) - Good Algorithms and Better Algorithms 

Briefly discuss with your class the topic: what properties make for a good algorithm? What makes one algorithm better than another? Properties you may want to discuss if your students do not volunteer them:

  • correctness
  • ease of understanding
  • elegance (clarity, simplicity, and inventiveness)
  • efficiency

A good analogy is purchasing a car, where people are concerned about:

  • safety
  • ease of handling
  • style
  • fuel efficiency

Today's session will address the topic of efficiency.

Algorithmic Efficiency [10 min]

Introduce the concept of algorithmic efficiency to your students by asking them if any can describe what algorithmic efficiency is, or what it means for an algorithm to be efficient. Briefly describe efficiency as how well an algorithm uses two resources, time and space (stored memory), to solve a problem. Some topics you may wish to discuss include:

  • Two algorithms may both solve the same problem correctly, but with different degrees of efficiency.
  • An algorithm that is maximally efficient will minimize the resources it uses.
  • Algorithms typically face a space-time tradeoff, where they either use more memory to run faster or take more time but use less memory.
    • When you use a map, you are using more storage resources to go along your route more quickly
  • An example of an algorithm that trades space for time (stores more in memory to operate faster) is a lookup table.
    • A real-world example of a lookup table is numbered valet parking. The valet gives the customer a number and goes to park the customer's vehicle in the parking space with that number. When the customer or valet needs to find the vehicle again, instead of having to search through all the spaces, all they need is the remembered (stored) number to go directly to that parking space.
  • Most of the time, we are more interested in computational efficiency, or time usage of an algorithm.
  • A decision problem is a problem with a yes/no answer  (e.g., is there a path from A to B?). An optimization problem is a problem with the goal of finding the "best" solution among many (e.g., what is the shortest path from A to B?).
  • Efficiency is an estimation of the amount of computational resources used by an algorithm. Efficiency is typically expressed as a function of the size of the input.

Computational Complexity [10 min]

A central idea of analysis of algorithms is that many algorithms will take more and more resources (time or space) as the size of their input increases. Algorithms can be different yet produce the same result.  Algorithm's that require polynomial timesare considered to run in a reasonable amount of time. An algorithm that requires exponential or factorial times are considered unreasonably slow.

When comparing algorithms, time is not measured in seconds but rather in the number of computational steps needed for the algorithm to finish operation on a given input. Great algorithms grow linearly, at the same rate as their input, meaning the time it takes to finish is directly proportional to the size of the problem they are solving (amount of input data). For instance, an algorithm that takes 10 steps for an input of size 10 and 1000 steps for an input of size 1000 is said to be linear in its input. However, most algorithms take longer as their input gets larger. For instance, an algorithm that takes only 25 steps for an input of size 5 may take 100 steps for an input of size 10, 10000 steps for an input of size 100, and one million steps for a size of only 1000 (it is taking quadratically more time as the input gets larger).

When we analyze algorithms, we often talk about the algorithm's computational complexity, which is the order of magnitude of the algorithm's running time. An algorithms efficiency and be estimated by counting the nunber of times the code is executed hoever algorithmic efficiency is calcualted through formal reasoning rather than empirical measures.  

Efficiency is typically expressed as a function of the size of the input. For example, if an algorithm finishes with the same number of steps regardless of the size of its input, it is called constant time, which is O(1) in mathematical form (read aloud as "big-oh one"). Constant time algorithms are the fastest in terms of computational efficiency, and any algorithm that takes a constant number of steps is considered O(1). An algorithm that takes 10 steps for an input of size 10 and also takes 10 steps for an input of size 1000 is likely O(1). However, very, very few algorithms are constant time because most algorithms necessarily take longer as the size of their input increases.

An algorithm that can finish by looking at each piece of its input only once is called linear time or linear order, and is written mathematically as O(n), where n stands for "the size of the input." An algorithm that takes 10 steps for an input of size 10 and also takes 1000 steps for an input of size 1000 is likely O(n). Very few algorithms are linear order, especially if they must compare pieces in their input, such as sorting algorithms. The best sorting algorithms are somewhere between linear time and quadratic polynomial time, written as O(n2), where n2 stands for "the size of the input, squared." Any algorithm that is O(n2) typically must compare each piece of its input with every other piece of input at least once. An algorithm that takes 100 steps for input of size 10 and a million steps for input of size 1000 is likely O(n2).

Algorithm's that require polynomial - not exponential or factorial times - are considered to run in a reasonable amount of time.

Most sorting algorithms are of an order between O(n) and O(n2) known as linearithmic time, written as O(n log n), where log is the logarithmic function. In fact, O(n log n) is the fastest possible order for a comparison-based sorting algorithm. It is impossible for such algorithms to be O(n) since they must make at least some comparisons of their input data. 

Comparing Sorting Algorithms [15 min]

Using the simulation tools at http://www.sorting-algorithms.com/, students will investigate, compare, and contrast sorting algorithms. Notice the grid in the center of the page. Each column is a particular sorting algorithm, and each row is an ordering of horizontal bars (either random, nearly sorted, reversed order, or few unique). Each algorithm will sort the bars in a given cell from top to bottom in increasing order by length.

Ask your students to interact with the website by clicking the green start icons and observing how long it takes each algorithm to sort its bars relative to the other algorithms.

Some questions to have them discuss or record in their journal could include:

  • Find the row for "Random" and click the icon above it to see each algorithm sort a randomly ordered set of bars.
  • Which algorithms are going slow on average? Which ones are fastest?
  • Experiment with larger input sizes by clicking a number for Problem Size near the bottom of the page (30, 40, or 50). Click the icon above Random again. What changes do you notice in the speeds of algorithms? Why are the slow algorithms taking even longer than before? Would you ever want to use them?
  • Set the problem size back to 20.
  • Find the column for "Bubble" (bubble sort) and click the icon above it to see it run on each of the ordering types. Which one finishes first? Why do you think that is?
  • Find the row for "Nearly Sorted" and click the icon above it to see all the algorithms run on nearly sorted input data. Which algorithms finish first? What algorithm is slow on random data but finishes quickly on nearly sorted data? Why do you think it does so?
  • Which algorithms do you think are O(n2)?

Make sure your students understand that the size and order of input data can affect how long an algorithm takes. You should direct or help your students discover that bubble sort is a slow sorting algorithm that can be fairly fast for nearly sorted data. You may wish to discuss that bubble sort is O(n2) in the worst case, explaining why it takes so long for large input, but is O(n) in the best case, which is when input is already (or nearly) sorted. In contrast, selection sort is O(n2) in both the worst and best cases, and merge sort is O(n log n) in both the worst and best cases. In general, most sorting algorithms that we would want to use are O(n log n), since O(n2) is usually too slow. You may also want to mention that bubble sort is considered one of the most inefficient sorting algorithms and that quick sort’s worst performance is on already sorted data, so some quick sort implementations shuffle the inputs before sorting to avoid that situation.

Wrapup (5 min)

Watch one or more of the available movie clips that compare the performance of sorting algorithms:

Suggested list of videos (Many more are available):

Session 2

Getting Started (5 min)

Introduction [5 min]

Journal: Remind your students about the sorting algorithms from the previous session and have them answer the following questions:

  • What are some ways in which one algorithm can be better than another, besides efficiency?
  • Explain what algorithmic efficiency is by discussing two different sorting algorithms.

 

Guided Activity: Empirical Analysis (40 min)

The students will measure and analyze the effect of sorting set size on execution time for a given sorting algorithm using Python code. Using the timedsorts.py file in the lesson resources folder as a basis, the students will perform an experimental analysis to compare sorting algorithms by timing them on input data of different sizees. They will hypothesize, design and code their experiment, collect results, and write a report for homework.

Sorting functions available in the Python code include: quick sort, merge sort, selection sort, insertion sort, and bubble sort. For advanced students or classes may, you may wish to have them implement additional sorting algorithms. 

The sample code includes helper functions to generate random data, to load data from a file, and to time sorting functions on the data. Example code for invoking these functions is included at the end of the file. You can remove this example code before sharing it with your students if you wish to emphasize the programming and critical thinking required to do this project.

Each student (or pair or group) needs their own copy of the Python code to modify for their experiments.

Experimental Design [15 min]

Students will compare sorting algorithms by timing them with Python code on input data of various sizes. Have your students (individually or in pairs) make a hypothesis about what will happen as the size of data input increases, answering the following questions:

  • How can you determine which sorting algorithm is most efficient and which is least effiicient?
  • What sorting algorithm do you think is most efficient, and which is least efficient?
  • What do you hypothesize will happen to the time as the size of the data input increases?
  • What is the independent variable in this experiment?
  • What is the dependent variable?

Have your students write out a description of the steps they will take to perform the experiment.

Data Collection [20 min]

Have students modify their Python sorting code to implement the experimental steps they outlined. Students must:

  • Time their sorting routines with different size sets of items to sort (e.g., 5000, 10000, 25000, 50000). Sample data files are available in the lesson resources folder, but students should use the provided helper function to generate arrays of random data, too.
  • Record (write down) the size of each input array, the name of the sorting function, and the resulting time it took to sort the data for each algorithm/data combination they test.

 

Wrap-up (5 min)

Discuss the results with another student or group. What patterns can be seen in the relationship between the amount of data and the time to run the program?

 

Guidance for Practice Questions - Question Set 21

Questions in the AP Classroom Question Bank may be used for summative purposes.  

Sixty of the 80 questions are restricted to teacher access.  The remaining 20 questions are from public resources.  

Questions are identified by their initial phrases.

A programmer is writing a program that is inten...

The table below shows the time a computer syste...

Homework

Assign as homework, explain the difference between algorithms that run in reasonable time and those that do not. Address how reasonable and unreasonable times are identified and what imapct that difference has on program designmers.

 

Optional Extension

Students complete a short research report on their sorting algorithm research procedure, results, and analysis of the results.

  • What algorithm or algorithms are most efficient? Why?
  • What algorithm is least efficient? Why?
  • What values did you use for your independent variable?
  • Present the data you collected in a table and in a graph.
  • What conclusions can you draw about sorting algorithms?
  • Explain why algorithmic efficiency is important by discussing another problem (not sorting) where a correct but inefficient algorithm is unusable at larger input sizes. 
  • Pick two sorting algorithms you tested. Write a paragraph for each describing how it works, and one paragraph comparing the two algorithms explaining which is more efficient and why (you can do research and look at the Python code to figure out the reasons).

 


Options for Differentiated Instruction

The teacher may decide to have the students choose how they want to organize the empirical analysis effort.  Alternatively, scaffolding with a worksheet or checklist could be used to guide the students through the data collection and analysis tasks.


Evidence of Learning

Formative Assessment

The following "Checks for Understanding" could be used to guide the students towards the three learning objectives:

Objective: SWBAT identify families of correct algorithms that have different efficiencies in their problem solving approach.

  1. Students will pair-share what makes a good choice for the route taken to get from point A to point B.
  2. Students will compare algorithms and explain why and when some are better than others in terms of efficiency.
  3. Students will be able to identify and rank order the least efficient sorting algorithms in the simulations.

Objective: SWBAT demonstrate logical reasoning and metrics is used to describe an algorithm’s efficiency.

  1. Predict:  Students will have seen sorting algorithms implemented as folk dances.  Students will predict -- for their algorithm -- how adding additional dancers would increase the dance completion time.

Objective: SWBAT to perform empirical analysis of sorting algorithms by running the algorithms on different inputs.

  1. Students will work in pairs to collect data on sorting execution times.  The pairs will share their results with other groups to check for patterns before they write up their results.

Summative Assessment

Students will complete a short research report on their sorting algorithm research procedure, results, and analysis of the results.

Lesson Summary

Summary: Students are introduced to the theory of computation, computability, the halting problem, and advanced algorithms. In particular, they will learn about heuristic search used by artificial intelligence (AI) programs to play games.

Objective: 

Students will be able to:

  • define computation and some basic ideas of the theory of computation
  • discuss computability and understand there are some things computers cannot solve
  • explain the Halting Problem
  • identify some advanced search algorithms
  • understand how AI programs represent games with game trees
  • understand how AI programs use uninformed and heuristic search algorithms to play games

Overview:

Session 1

  1. Getting Started (10 min)
  2. Guided Activity (35 min)
    1. Inverse Operations Activity [10 min]
    2. Computation [10 min]
    3. Computability [15 min]
  3. Wrap Up (5 min) - Think-Pair-Share 

Session 2

  1. Getting Started (5 min)
  2. Guided Activity (45 min)
    1. Search and Game Trees [15 min]
    2. Game-Playing AI [10 min]
    3. Types of Heuristic Search [10 min]
    4. Playing a Game with Heuristic Search [10 min]

Learning Objectives

CSP Objectives

  • EU AAP-2 - The way statements are sequenced and combined in a program determines the computed result. Programs incorporate iteration and selection constructs to represent repetition and make decisions to handle varied input values.
    • LO AAP-2.L - Compare multiple algorithms to determine if they yield the same side effect or result.
  • EU AAP-4 - There exist problems that computers cannot solve, and even when a computer can solve a problem, it may not be able to do so in a reasonable amount of time.
    • LO AAP-4.A - For determining the efficiency of an algorithm: a. Explain the difference between algorithms that run in reasonable time and those that do not. b. Identify situations where a heuristic solution may be more appropriate.
    • LO AAP-4.B - Explain the existence of undecidable problems in computer science.
  • EU CSN-2 - Parallel and distributed computing leverage multiple computers to more quickly solve complex problems or process large data sets.
    • LO CSN-2.A - For sequential, parallel, and distributed computing: a. Compare problem solutions. b. Determine the efficiency of solutions.

Math Common Core Practice:

  • MP2: Reason abstractly and quantitatively.
  • MP4: Model with mathematics.

Common Core Math:

  • N-RN.3: Use properties of rational and irrational numbers.
  • F-BF.1-2: Build a function that models a relationship between two quantities
  • F-BF.3-5: Build new functions from existing functions
  • S-IC.1-2: Understand and evaluate random processes underlying statistical experiments
  • S-CP.6-9: Use the rules of probability to compute probabilities of compound events in a uniform probability model
  • S-MD.5-7: Use probability to evaluate outcomes of decisions

Common Core ELA:

  • RST 12.3 - Precisely follow a complex multistep procedure

NGSS Practices:

  • 2. Developing and using models
  • 6. Constructing explanations (for science) and designing solutions (engineering)

Essential Questions

  • How can computational models and simulations help generate new understanding and knowledge?
  • What considerations and trade-offs arise in the computational manipulation of data?
  • What kinds of problems are easy, what kinds are difficult, and what kinds are impossible to solve algorithmically?
  • How are algorithms evaluated?

Teacher Resources

Student computer usage for this lesson is: required

Links to videos and online tools as indicated in the lesson plan.

 Alternative instruction could include the Towers of Hanoi problem and discuss the algorithm for solving it. Some demonstrations are available here:

Lesson Plan

Session 1

Getting Started (10 min)

Think-Pair-Share: In pairs, think about and try to answer each of the following questions:

  • Given y = 7x + 4 and x=3 what are the steps to find y?
  • Given y = 7x + 4 and y=3 what are the steps to find x?
  • Factor 81,927,497 and 81,927,499. Can you figure out the steps?
  • Multiply 431 x 433 x 439. What are the steps?

Note: just give them a few minutes to try the factoring, but round them up to continue and discuss: which operations were much harder to perform than their inverse? Can you just invert the steps, and why or why not?

Guided Activities (35 min)

Inverse Operations Activity [10 min]

  1. Convey the following concepts:
    1. Inverse arithmetic operations
      • add/subtract                     [x + 7 – 7 = x – 7 + 7 = x] ;
      • multiply/divide                  [x * 7 / 7 = x / 7 * 7 = x] ;
      • ex /ln(x)                              [ln(ex) = eln(x) = x];
    2. Some arithmetic operations are harder to do then their inverse operations -- as the students did during the warm-up.
      • Cubing a number versus finding the cube root of the result.
        • Find the cube of 12
        • Find the cube root of 5832
      • Multiplying numbers to form a product versus factoring the product.
    3. The same can be true with algorithms.
      • It is much easier to scramble a Rubik’s cube with a few moves than it is to solve a scrambled Rubik’s cube with a few moves.  [Optional -- Have the students discover this using a Rubik's cube or an online simulated cube].

Make a connection to the previous lesson by comparing these to sorting algorithms, where some are speedy and efficient like Merge sort and Quick sort, and others are unusably slow, like Bubble sort. Highlight the difference that different problems have different lower bounds on optimal solutions, and that some problems like integer factorization have solutions but take too long to be solved in a practical way.

Computation [10 min]

Discuss the definition of computation (in a theoretical sense) with your students. Computation is input plus processing to get output. A computer is one system that is a "model of computation" since it takes input, processes it, and produces output.

Another model of computation is called a Turing machine, named after Alan Turing (one of the most famous computer scientists). A Turing machine is a theoretical entity that has a tape of symbols (a line of 0s and 1s), a head that can read only one symbol at a time, and an internal state that can change based on instructions as the head reads symbols. Turing and a mathematician called Alonzo Church are responsible for the "Church-Turing" thesis, which says that a Turing machine can compute anything that a digital computer can. This is a fundamental idea of the theory of computation, and has the implication that anything one computer is capable of doing is possible to be computed by another, given enough resources (time and memory). 

Parallel computing is a computational model where the program is broken into multiple smaller sequential computing operations, some of which are performed simultaneously. Distributed computing is a computational model in which multiple devices are used to run a program. Comparing efficiency of solutions can be done by comparing the time it takes them to perform the same task.

Computability [15 min]

Now discuss the idea of computability with your students. Ask your students to answer or think-pair-share: are there things it is impossible for a computer to compute? A decision problem that can be solved by an algorithm that halts on all inputs in a finite number of steps is decidable.  The most classic "undecidable" (non-computable) question is called the Halting Problem. The Halting Problem is: make a program that can tell if any other program will halt (terminate at some point eventually) or will loop forever and never end.  This can be done for specific algorithms (instances of the Halting Problem) but not for the general problem regarding all possible algorithms.

The Halting Problem is impossible for a computer to compute, which you can prove (informally) by paradox. Suppose you did have a program that solved the Halting Problem, called HALT(X), which takes the code for some program X as input and says "yes" if X terminates or "no" if X loops forever. Then you could write a new program that uses HALT inside it, which we will call PARADOX(X). First PARADOX(X) will run HALT(X) and if the result is no, PARADOX will halt, but if the result is yes, then PARADOX will loop forever. But here is the problem: what if we use the code for PARADOX as the input to PARADOX, running PARADOX(PARADOX)? If it says that PARADOX halts, then PARADOX runs forever, and if it says PARADOX runs forever, then PARADOX halts. This problem is a paradox and does not make sense because the premise, that a program called HALT could exist, must be wrong! Therefore, the Halting Problem is impossible for a computer to solve.

Video explanation with optional student simulation

  • Alonzo Church, an American, and Alan Turing, from the UK, independently proved in the 1930s – before computers actually existed – that there are some problems that computers will never be able to solve. View: The Halting Problem at: https://www.youtube.com/watch?v=92WHN-pAFCs  [Optional 7:52]  Have groups of students act out the machines in the video to determine whether they understand the basics of the proof.

Wrap Up (5 min)

Think-Pair-Share:

  • Ask your students to think about the following algorithms, pair off, and reorder them based on worst-case computational complexity, with the fastest ones first and the slower (or undecidable) ones last:
    • Bubble Sort
    • Factoring large integers
    • Merge Sort
    • Binary Search
    • Taking attendance
    • The Halting Problem
  • Discuss the orderings that a few groups came up with. Advanced groups could also try to guess the computational complexity:
    • Binary Search, O(log n)
    • Taking attendance, O(n) since you just read off the list in order
    • Merge Sort, O(n log n)
    • Bubble Sort, O(n2)
    • Factoring large integers, O(en), approximately exponential depending on the algorithm
    • The Halting Problem, undecidable

Session 2

Getting Started (5 min)

This session concerns advanced algorithms, in particular heuristic search, which is commonly used in artificial intelligence. Refresh your students' minds on the definitions of computation, computability, and undecidable problems. Additionally, mention the properties we consider when we compare algorithms:

  • correctness
  • ease of understanding
  • elegance and style
  • time/space efficiency

Guided Activity (45 min)

Search and Game Trees [15 min]

Introduce the idea of heuristic search, which is a class of algorithms used in many artificial intelligence programs. A heuristic is something that is used to find a good solution in a reasonable time, and a heuristic search algorithm is an algorithm that uses heuristics to determine how to search through some space.

A great way to introduce heuristic search is first to discuss game trees. A game tree is a structure that is used to represent the "space" of a game that an algorithm wants to search through.

Think of a game like chess: you make a move, the opponent makes a move, and the process continues until the ending conditions have been met (one player in checkmate or stalemate). A game tree is a mathematical structure used by AI and heuristic search algorithms to model the moves made in a chess game. At any turn, we can make a "tree" by drawing the root node as representing the current state of the board and drawing one branch under it for every possible move. In Tic-Tac-Toe, if you are the starting player, then the root node represents a blank board, and there will be nine branches, one for each possible move (each space where you could place your mark). Following a branch in the game tree takes you to a new node that represents the configuration of the game that results from having taken that move. In Tic-Tac-Toe, if I am the first player and place my X in the center space, I have "followed" that branch down the tree to a new node that represents the board with an X in the center space. The opponent then uses this node as the root of their game tree, and has a branch for each of their possible moves.

Think-Pair-Share: Have your students pair off and play a game of Tic-Tac-Toe and try to draw the game tree as they play it, drawing the nodes for each move they made and every potential branch from those nodes. Bring them back into discussion and ask them what if they had to draw out every node followed down every branch? Now ask them to imagine the game tree for chess, which has 20 possible moves on the first turn, 400 on the second, and many, many more as the game goes on. How can an artificially intelligent program learn to play chess when there are so many (too many) options? Chess actually has around 35100 nodes in its tree and 1040 legal states.

Game-Playing AI [10 min]

Heuristic search on game trees is one way AI programs are able to play games like chess. How good are computer game players?

Typically games modelled with game trees are 2-person games, players alternate moves, and they are zero-sum (meaning one player's loss is the other's gain). More complicated elements in such games may have include: hidden information (like other players' hands), chance (dice), or multiple players.

Playing a Game with Heuristic Search [10 min]

How does an AI program use heuristic search to play a game? Typically in these steps:

  • consider all moves that are possible for the current turn
  • compute what the new positions and configuration of the board (the "state") for each of those moves
  • evaluate each state using some scoring function to determine which is better
    • for example, taking an opponent's piece is probably going to be evaluated more highly than simply moving a pawn
  • make the move that results in the best evaluated state
  • wait for your opponent to play, repeat

The key problems are:

  • representing the state of the board
  • generating the resulting states from every move
  • evaluating the value of the resulting states

For evaluation, some function is typically coded or learned over time.

  • For Tic-Tac-Toe, for board state n, an evaluation function could be:
    • f(n) = [# of 3-lengths open for me] - [# of 3-lengths open for you], where a 3-length is a complete row, column, or diagonal
  • Alan Turing's function for chess with board state n:
    • f(n) = w(n)/b(n) where w(n) is the sum of the point value of white's pieces on the board at state n, and b(n) is the sum of black's pieces. The point values are those commonly used by professional chess players, pawn is 1 point, the queen is 9 points.

Types of Heuristic Search [10 min]

Refer to the "Advanced Algorithms" slides in the lesson resources folder for examples of uninformed search. For an activity, you may want to create a game tree for Tic-Tac-Toe and have your students walk through how each of the following algorithms would operate over it.

Uninformed Search are algorithms that work without a heuristic, using no information about the likely "direction" of the goal node. Algorithms include:

  • depth-first search
    • starting at the root of the game tree, pick one branch and examine the node at the end of it, then pick one branch of that node and examine the even deeper node (hence, "depth-first") until you reach the end of the game, then go back one node, pick one of its branches, explore until you reach the end of the game, and so on
    • this search is unpractical for games with many moves that may go on indefinitely
  • breadth-first search
    • starting at the root of the game tree, called A. It has n branches leading to child nodes each called B1, B2, and on to Bn, for some n nodes. Examine each of these children in order before moving onto the children of B1, examining each of those in order, then examining the child nodes of B2 then those of B3, and so on.
    • this search is unpractical for games with many or variable numbers of moves per turn

For any games with variety and complexity, certainly for chess and even checkers, uninformed search is simply too slow because it is exhaustive. This problem is another example, like with sorting, where the efficiency of our solution matters a great deal. To get programs to play games, we need them to be efficient and intelligent about the number and quality of moves they consider. 

Informed Search algorithms each follow some heuristic that uses information about the game to determine smart directions to explore. Examples include:

  • best-first search ("greedy")
    • at every turn, take the branch leading to the node with the greatest value from the evaluation function 
    • the problem here is when there is delayed reward since this "greedy" approach lacks any ability to look ahead. For instance, if there are two moves available, one that is great and another that is just decent, it will always pick the great move even if a winning move could be made the turn after the decent one. 
  • A* ("a star")
    • estimates the goal node and picks nodes based on the least cost. It follows a best-first strategy but also factors in the distance it has traveled from the original root node.
  • Your GPS device! In order to figure out the best, fastest route to your destination, your GPS will search through the possible roads you can take intelligently by factoring in things such as the span and capacity of the road, the traffic, and potentially even the time of day. 

Advanced classes may wish to discuss local search algorithms, such as hill-climbing and genetic algorithms (in the "Advanced Algorithms" slides in the lesson folder).

Minimax

Thinking about game trees again, we want to select the branch that takes us to a node with the maximum evaluated state. But there is a catch: the opponent gets to make moves, too. That is, every other branch in our game tree is the opponent's turn. How does the AI program account for the other player?

Perhaps most logically, the way AI programs do so is to assume the other player will play optimally. Just as the AI will take the branch that leads to the state with the greatest evaluation, it assumes the other player takes the branch leading to the state that will maximize their position. In other words, the AI searches through their game tree by following the branch with the maximum value on their turn, and following the branch with the minimum value on the opponent's turn. This algorithm is called minimax and is the basis of nearly all AI that play 2-person zero-sum games.

Guidance for Practice Questions - Question Set 22

Questions in the AP Classroom Question Bank may be used for summative purposes.  

Sixty of the 80 questions are restricted to teacher access.  The remaining 20 questions are from public resources.  

Questions are identified by their initial phrases.

A certain computer game is played between a human

Under which of the following conditions is it m...

Which of the following programs is most likely to

Which of the following statements is true?


Options for Differentiated Instruction

For the Halting Problem proof, it is important that students can translate the solution that is on the video into a representation that makes (some) sense to them.  Acting out the inputs and outputs of the set of machines is an approach worth trying.


Evidence of Learning

Formative Assessment

The following "Checks for Understanding" could be used to guide the students towards the three learning objectives.

Objective: Students will identify some Advanced Algorithms that Exploit Inverse Operations Efficiency.

  1. Pairs of students will be asked to list pairs of basic arithmetic inverse functions.
  2. The class will develop a composite list of inverse functions found by the student pairs.  Note: many of these pairs share the same key on their graphing calculators.
  3. Students will factor composite numbers and create the same composite numbers from their prime factorization.  They will log the relative effort in their journals.

Objective: Students will identify some Advanced Algorithmic Techniques.

  1. Students will find examples from earlier modules where the algorithms used techniques of heuristics, randomness, probability, etc.  This could be a good group review of prior topics.

Objective: SWBAT discuss at least one example of a computing problems that is unsolvable

  1. Students will either describe the proof of Turing's Halting problem using models in a way that is similar to that used in the lesson video or they wil build a similar physical model using students as the machines.

Summative Assessment

Students will be able to summarize -- in their own words or with simple models -- the proof of the Halting Problem.

Students will be able to identify the sensitivity of cryptography to the difficulty of factoring large numbers.

Lesson Summary

Summary

Students collaboratively define, design and implement a programming project: a miniature version of the Create Performance Task. They create and share presentations and share with groups (of about six students).

Students present. the project goals, the development process they used, functional and dta abstractions they considered and used.

 

Outcomes

  • Students will complete a miniature version of the Create Performance Task.

Overview

Session 1:

  1. Presentation  (5 min) Students are introduced to the Create Task.
  2. Activity (10 min) Students select a small programming project to model the Create Task.
  3. Activity (10 min) Students identify and describe  key algorithm to use in their program.
  4. Activity (10 min) Students identify a functional abstraction that uses a parameter and a data abstraction to use in their program.
  5. Wrap Up (5 min) Students complete the project development plan table.  In it they describe their program goals, a key algorithm they plan to implement, and the abstractions they plan to use.

Sessions 2 and 3:

  1. Journal (2 min) - What is your plan for today for the development of your project?
  2. Activity (40 min) - Students implement their projects.
  3. Journal (5 min) - Reflection. What abstractions did you use in your project?

 

Session 4:

  1. Getting Started (2 min) - Explain goals for the presentations.
  2. Activity (30 min) - Students prepare presentations of their projects.
  3. Presentations (15 min) - Students present their project to table groups.
  4. Wrap Up (3 min) - Students create exit slips with any questions or comments about the Create Task.

Learning Objectives

Key Concepts

Students working in pairs practice choosing a project and planning how to implement it in a fixed time frame.

Students have just two days to plan and implement a project that uses a user created function and makes use of a list or other data structure.  Since this is a practice task, teachers may provide help with algorithms, functions, data abstraction or other programming concepts. Remind students that during the actual task teachers may not provide this kind of support so as much as possible students should rely on the collaborative partners for assistance when needed. 

Students may receive most or all of the credit from an incomplete project if the project demonstrates the required components.

For this practice task, teachers may want to provide program stubs.  Stubs could include a suggested function and data abstraction. Functional abstraction should include a parameter that is used to make a selection in the procedure.


Teacher Resources

Lesson Plan

Session 1:  Planning Day

Briefly present an overview of the Create Performance Task. Explain that students will have 12 hours to complete the Create Task later in the course and they will four sessions for the practice version we are beginning today. Explain that the Full Create Performance Task Guidelines are provided as a resource and a guide for students however students are to complete the guidelines for the practice task.  Requirementst from the full task that we are not doing are struck though in the lis below.  For instance, in the practice Create Task we will not be producing a video or writing responses about the video.

Full Create Performance Task Guidelines

 

General Requirements

You will be provided with a minimum of 12 hours of class time to complete and submit the following:

    1. your complete program code;
    2. a video (created independently) that displays the running of your program and demonstrates functionality you developed; and
    3. your individual written responses to all the prompts in the performance task.

 

 

Submission Requirements

  1. Program Code

Your program must demonstrate:

  • output (tactile, audible, visual, or textual) based on input from:
    • the user (including user actions that trigger events); or
    • a device; or
    • a file;
  • use of at least one list (or other collection type)to represent a collection of data related to the program’s purpose; and
  • development of at least one procedure that uses one or more parameters to accomplish the program’s intended purpose, and that implements an algorithm that includes sequencing, selection, and iteration.

Include comments or acknowledgments for any part of the submitted program code that has been written by someone other than you and/or your collaborative partner(s) including the origin of the code.  The acknowledgement should include the origin or original author’s name.

Create a PDF file that contains all of your program code (including comments).

  1. Video

Your video must demonstrate your program running, including:

  • input to your program; and
  • at least one aspect of the functionality of your program; and
  • output produced by your program.

Your video:

  • must be either .mp4, .wmv, .avi, or .mov format; and
  • must not exceed 1 minute in length; and
  • must not exceed 30MB in file size.

Collaboration is not allowed during the development of your video. Your video must not contain any distinguishing information about yourself.  Your video must not be narrated, but text captions are encouraged.

  1. Written Responses

Submit one PDF file that includes your responses to each prompt below. Clearly label your responses 3a–3d in order. Your response to all prompts combined must not exceed 750 words, exclusive of the program code. Collaboration is not allowed when answering the written responses.

3a. Provide a written response that:

  • describes the overall purpose of the program; and
  • describes what functionality the video illustrates; and
  • describes the input and output shown in the video.

 

3b. Capture and paste two program code segments you developed during the administration of this task which contain a list (or other collection type) being used in your program. The first program code segment must show how data has been stored in the list1. The second program code segment must show the data in the same list being processed, such as creating new data from the existing data.  Then, provide a written response that:

  • identifies the name of the list being processed in this response; and
  • identifies what the data contained in the list is representing in your program; and
  • explains how the selected list manages complexity in your program code by explaining how your program code would be written differently without using this list.

 

3c. Capture and paste a procedure from your program that you developed during the administration of this task which implements an algorithm used in your program. This procedure must:

  • contain and use one or more parameters that have an effect on the functionality of the procedure;  and
  • implement an algorithm that includes sequencing, selection and iteration.  

Then, provide a written response that:

  • describes what the selected procedure does and how it contributes to the overall functionality of the program; and
  • explains how the algorithm implemented in the selected procedure accomplishes its task.

 

3d. Provide a written response that:

  • describe two calls to the selected procedure identified in written response 3c. Each call must pass different arguments that cause a different segment of code in the algorithm to execute; and
  • describes what condition(s) is being tested by each call to the procedure; and
  • identifies the result of each call.

 

Practice Create Performance Task Guidelines

For this practice task, students will complete simpler project and a two to three minute presentation about it, rather than a video and a report. Students work in partners to select and develop projects.

After completing the project, students will create a presentation about it.  The presentation should address:.

 

  1. Program Goals
  2. Key algorithm - Describe a selection the algorithm makes and describe how the algorithm uses iteration.
  3. Functional abstraction - Describe at least two different tasks the function may perform.
  4. Data abstraction - Describe the values that will be stored and how they will be used by the program.

 

Projects are chosen by the students.  If they wish, projects may be based on the following labs from How to Think Like a Computer Scientist.  Before selecting a project students should consider how they might use a data structure (list) in their project.

 

How to Think Like a Computer Scientist Labs

Students choose and plan a project with partners and complete the project development table.

 

Project Development Plan

Team Members:

Project Name

 

Project Goals

 

Key algorithm

Describe a selection the algorithm makes and describe how the algorithm uses iteration.

 

Functional abstraction

Describe at least two different tasks the function may perform.

 

Data abstraction

Describe the values that will be stored and how they will be used by the program.

 

 

 

Closing (3 min)

Students submit the Project Development Plan.

Students reflect on their project and making journal entry of goals for tomorrow.

Session 2 & 3: Implementation Days

Warm up (3 min)

Students complete a brief journal entry planning their team and individual work for today.

 

Work Time (45 min)

Students work to implement and test projects. Teachers may evaluate student performance based on student journal entries and their observations of student effort in implementing the project. In the Create Performance task, students should direct their questions with their partners and or consult their class notes and examples for help.  Teachers may only help resolve technical issues (hardware or software) or provide clarification of the project requirements. In this practice task, students may receive limited help in developing the project.

 

Closing (2 min)

Students submit the key algorithm code developed so far.

Students make a journal entry about the progress they made and their goals for tomorrow.

 

Day 4: Presentation Day

Each student prepares a presentation.  Partners may collaborate in the report presentation but each partner must prepare their own report.

 

Getting Started:

 

Explain the presentation requirements.

 

Format:  A two to three minute presentation with partners taking turns in the presentation,

 

Required Contents:

 

Partner Names

 

Purpose

 

Data Abstraction

Function in the project

Code showing creation and use.

 

Functional Abstraction

Function in the project

Code showing definition and calls

Parameter it uses and alternative paths through the procedure

 

Key Algorithm

Purpose of the algorithm

Code and how the algorithm works (pseudocode can be used)

Use of selection and iteration

Presentation Preparation (30 min)

Presentations (15 min)

Students present their project to table groups. Time the presentations so that they do not exceed 3 minutes. Students discuss with table groups what they like about the project, what they learned and any questions or comments they have.

Closing (2 min)

Students submit their presentations and program code.  Students create exit slips with any questions or comments they have about the Create Task.


Evidence of Learning

Formative Assessment

For the practice task, if student projects are too big or too small in scope, teachers should provide guidance.

For the practice task, project descriptions and pseudocode/algorithm description for each proposed project should be assessed.  Assessment can be done by collaborative partners first.

If partners have concerns, they should be brought to the teacher. 


Summative Assessment

The project should be assessed using the latest rubric provided by the College Board.

The latest rubric (updated as of November 2018) is in the lesson folder.  The practice project does not ask students to create a video so row 1 should net be used.  The assessment can be the basis of a summative grade using the suggested scale below.

Complete presentation submitted 25%

Code with abstractions submitted 25%

Rubric Points (5) 5 * number of rubric points up to 50%

Lesson Summary

Summary

Students will define, design and implement a programming project: a miniature version of the Create Performance Task. They will create presentations and share with groups the projects they developed and how their project used abstractions.

Outcomes

  • Students will complete a miniature version of the Create Performance Task.

Overview

Session 1:

  1. Presentation  (5 min) Students are introduced to the Create Task.

  2. Activity (15 min) Students select a small programming project to model the Create Task.

  3. Activity (10 min) Students identify an algorithm to use in their program and share within tabe groups.

  4. Activity (10 min) Students identify an an abstraction to use in their program.

  5. Wrap Up (5 min) Students share the algorithms and abstractions they will use with the class.

 

Session 2:

1. Journal (2 min) - What is your plan for today for the development of your project? What abstractions do you plan on using in your project?

2. Activity (43 min) - Students complete implementing their projects.

3. Journal (5 min) - Reflection. What abstractions did you use in your project?

Session 3:

1. Getting Started (5 min) - Students individually respond to three prompts about their projects.

2. Activity (20 min) - Students prepare one minute presentations of their projects.

3. Presentations (15 min) - Students present their project to table groups.

4. Wrap Up (5 min) - Students create exit slips with any questions about the Create Task.

Learning Objectives

Key Concepts

Students working in pairs practice choosing a project and planning how to implement it in a fixed time frame. 

Students have just two days to plan and implement a project that uses a user created function and makes use of a list or other data strucutre.  Since this is a practice task, teachers may provide help with algorithms, functions, data abstraction or other programming concepts. Remind students that during the actual task teachers may not provide this kind of support so as much as possible studtns should rly on the collaborative partners for assistance when needed.Since an algorithm is a list of steps that comes to a conclusion, if students develop pseudocode for their projects they can refer to the pseudocode as their algorithm. 

Students may receive most of the credit from an incomplete project if the project demonstrates the required components.

For this practice task, teachers may want to provide program stubs.  Stubs could include suggested functions. 

 


Essential Questions

  • How are algorithms implemented and executed on computers and computational devices?
  • How do computer programs implement algorithms?
  • How does abstraction make the development of computer programs possible?

Teacher Resources

Lesson Plan

Session 1:  Planning Day

Present an overview of the Create Task.

Explain that students will have 12 hours to complete the Create Task later in the course and they will three 50-minute sessions for this practice.  The actual Create Task will have a suggested collaborative component and be larger in scope.

Discuss the following guidelines for the full project and the practice project we will be doing.

Full Create Task Guidelines

Three components to create:

  • Program
  • Report
  • Video

General:

One project - individual with collaboration in stages

12 hours of classroom time

Project must use functional and data abstraction.

 

Report: Written responses  (response to all prompts combined must not exceed 750 words, exclusive of the Program Code.):

a. Provide a written response or audio narration in your video that:  Identifies the programming language and identifies the purpose of your program.   

b. Describe the incremental and iterative development process of your program

c. Describe how a selected algorithm functions.

d. Explain how an abstraction you developed helped manage complexity

 

Practice Create Task Guidelines

For this practice task, students will complete simpler project and a one-minute presentation about it, rather than a video and a report.

Students work individually to select projects.

 

After completing the project, students will create a one-minute presentation about it.  Presentation should address the following.

a.  Identify the programming language and the purpose of your program.   

b. Describe the incremental and iterative development process of your program

c. Describe how a selected algorithm functions.

d. Explain how an abstraction you developed  helped manage complexity

The presentation must address at least points a and b of the above and c or d.

Projects are chosen by the student.  If they wish, their projects may be based on the following labs from How to Think Like a Computer Scientist.

Labs

 

Students select a project and share their ideas with partners. 

After collaborating with partners, students submit to their teacher a brief description of the project describing its most important features and how it will work.

Students identify an algorithm and an abstraction to use in their projects.  The algorithm should be written as pseudocode and then shared with their partners.

Students should discuss in groups what abstraction they chose and how they think it would be helpful.

 

Session 2: Implementation Day

Warm up (2 min)

Students complete a brief journal entry describing:

  • Their plan for today in the development of the project.
  • What abstractions they will be using in the project.

Work Time (43 min)

Students work to implement and test projects. Teachers may evaluate student performance based on student journal entries and their observations of their effort in implementing the project.

 

Closing (5 min)

Students reflect on their project and making journal entry of how they used abstraction in the project.

 

Day 3: Presentation Day

Warmup (5 min)

Students begin by individually responding to these prompts about their project:

b.    describe the purpose, how your program code works and the most important features and algorithms

d.    describe the development process

e.    explain an abstraction and how it helped manage complexity

 

Presentation Preparation (20 min)

Students prepare one-minute presentation about their projects including their responses to prompts b, d and e. 

 

Presentations (15 min)

Students present their project to table groups. Time the presentations so that they do not exceed 1 minute. Students share with table groups what they like about the project, what they learned and any questions they have.

 

Closing (5 min)

Students create exit slips with any questions they have about the Create Task after viewing and discussing the presentations.

 

 


Evidence of Learning

Formative Assessment

For the practice task, project descriptions and pseudocode for each proposed project should be assessed.  Assessment can be done by collaborative partners first.  If partners have concerns, they should be brought to the teacher.  If student projects are too big or too small in scope, teachers should provide feedback.


Summative Assessment

The project should be scored using the latest rubric provided by the College Board. 

The latest rubric (updated as of June 2016) is in the lesson folder.