Learners will be able to:
- Know how to find particular parts of data in text
- Retrieve and print this data out
- Use this technique for several lines of text
In this lesson you will learn how to find particular data in a line of text. Imagine you had thousands of lines of text and within that are 100 email address which need finding, you could look for them or, use a simple Python program to find them.
Part 1: Finding an address - sort of!
Part 1: Finding an address - sort of!
- Look and run the code below, what does it do?
- Look at the a, what is this? Can you get it to print out the @ symbol?
- To find the email address you need to find where the email address starts, it starts after the "From:", use the text.find(":") to find the position that the email address begins.
- Then print out the text at the position that the code found
- You will notice too many symbols are printed, how can this be sorted? (clue +1)
Part 2: Finding the complete email address
- Now you have the position of the beginning of the email address you need to find it's end, use the code similar to email_pos = text.find(":") to find the end of the email address
- Add the end position to line 13
- Edit the code below to print out the full email address
Part 3: Finding more than one email address
- Now you can find an email address, you can use the code to look through several lines and return all email addresses.
- Create a for loop on line 7 which looks through each line and then find the positions and prints to email address. Then looks through the next line and so on.
- Clues: for line in, change to line,
Part 4: Find the Jobs
- Using what you have learnt and the code box below, create a program which extracts the jobs from the lines of text.
Part 5: Look for the confidence of the SPAM emails
- This project will use the skills you have learnt to open a file and read through the file, looking for lines: "X-DSPAM-Confidence: 0.8475".
- Remove the number from each of the lines and print
- Extension: Count these lines and find the floating point values from each of the lines and compute the average of those values.
- Answer = Average spam confidence: 0.750718518519
- Download the two files below and ensure that they are both saved into the same folder
- When prompted to open the file type words.txt
- Good luck
|
|