Unit 4. Data Acquisition
Revision Date: Jan 05, 2020 (Version 3.0)Summary
This lesson introduces students to reading information from an input file and writing to an output file as a functionality of Python programming as an example of using a data source to gain insight. The students will then apply these concepts to program a simple Dice Roll application to generate data. This lesson will prepare students to read and write files for use in later Data Acquisition lessons.
Outcomes
Student computer usage for this lesson is: required
Python for Everybody by Charles Severance, http://do1.dr-chuck.com/pythonlearn/EN_us/pythonlearn.pdf.
Explanation of the CountIf function in Excel https://support.office.com/en-us/article/COUNTIF-function-e0de10c6-f885-4e71-abb4-1f464816df34.
The mbox.txt and mbox-short.txt files are in the Lesson Resources Folder.
Think-Pair-Share
Have students review their journal entries as a class and note the advantages and disadvantages on a white board.
The students should code the examples in the book as the teacher proceeds through the lessons.
Discuss what possible data sources might be used as input to programs. Brainstorm ways that combining data sources, clustering data, and classifying data are parts of the process of using programs to gain insight and knowledge from data. Point out to students that data doesn't always fall from trees in exactly the quantity, quality, and format needed for your program. A programmer needs to be a critical consumer of data and know when to use multiple sources, clean the data, or sort through to find the right breadth and variety of data needed.
countIf
function in Excel.this = COUNTIF(A1:A1000,1)
counts how many 1s are in the range A1 to A1000. You can show the example on the Microsoft office help website. https://support.office.com/en-us/article/COUNTIF-function-e0de10c6-f885-4e71-abb4-1f464816df34countif
function to compare the distribution of the rolls for how many times each number 2 through 12 was rolled with the pair of six-sided dice to the distribution for the 12-sided die.
Have students work in pairs as the new concepts are introduced and practiced.
For a class needing more scaffolding: Work as a group. Have students take turns around the room to read aloud the brief text in each section in Chapter 7. Do the short exercises together with a "row captain" assigned to each row (or group) in the classroom who is in charge of checking that everybody in their row has completed each short task and has gotten the help needed to finish. Row captains help each other until the entire class has successfully completed each task. Report out on what challenges were encountered, recording problems and solutions at the front of the classroom as the class works. Rotate the role of row captain for each section.
For more independent students: Introduce/demonstrate the key ideas first and then allow student to work through Chapter 7 at their own pace.
The teacher will check the student’s code for understanding.
The teacher will check for understanding as each new concept is introduced.
Exercise 7.1 Write a program to read through a file and print the contents of the file (line by line) all in upper case. Executing the program will look as follows:
python shout.py
Enter a file name: mbox-short.txt
FROM STEPHEN.MARQUARD@UCT.AC.ZA SAT JAN 5 09:14:16 2008
RETURN-PATH: <POSTMASTER@COLLAB.SAKAIPROJECT.ORG>
RECEIVED: FROM MURDER (MAIL.UMICH.EDU [141.211.14.90])
BY FRANKENSTEIN.MAIL.UMICH.EDU (CYRUS V2.3.8) WITH LMTPA;
SAT, 05 JAN 2008 09:14:16 -0500
You can download the sample input file from https://www.py4e.com/code3/mbox-short.txt
Exercise 7.2 Write a program to prompt for a file name, and then read through the file and look for lines of the form:
X-DSPAM-Confidence: 0.8475
When you encounter a line that starts with “X-DSPAM-Confidence:” pull apart the line to extract the floating point number on the line. Count these lines and the compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.
Enter the file name: mbox.txt
Average spam confidence: 0.894128046745
Enter the file name: mbox-short.txt
Average spam confidence: 0.750718518519
Test your file on the mbox.txt and mbox-short.txt files.