Hello,
I have the following assignment:
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From β line by finding the time and then splitting the string a second time using a colon.
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
The data can be found in the following link: https://www.py4e.com/code3/mbox-short.txt?PHPSESSID=3a64fe134f5f073f3911c47546619bcc
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1
What I have tried:
1 name = input("Enter file:")
2 if len(name) < 1:
3 name = "mbox-short.txt"
4 handle = open(name)
5 counts = dict();
6 for line in handle:
7 if line.startswith('From:'):
8 pass
9 elif line.startswith('From'):
10 x = line.split();
11 time = x[5];
12 t = time.split(':');
13 hour = t[0];
14 for line in hour:
15 hoursorted = sorted(hour);
16 counts[line] = counts.get(line,0) + 1;
17
18 print(hoursorted, counts[line]);
The output Iβm getting is:
[β0β, β9β] 1
[β0β, β9β] 1
[β1β, β8β] 1
[β1β, β8β] 1
[β1β, β6β] 2
[β1β, β6β] 1
[β1β, β5β] 3
[β1β, β5β] 1
[β1β, β5β] 4
[β1β, β5β] 2
[β1β, β4β] 5
[β1β, β4β] 1
[β1β, β1β] 6
[β1β, β1β] 7
[β1β, β1β] 8
[β1β, β1β] 9
[β1β, β1β] 10
[β1β, β1β] 11
[β1β, β1β] 12
[β1β, β1β] 13
[β1β, β1β] 14
[β1β, β1β] 15
[β1β, β1β] 16
[β1β, β1β] 17
[β0β, β1β] 18
[β0β, β1β] 2
[β0β, β1β] 19
[β0β, β1β] 3
[β0β, β1β] 20
[β0β, β1β] 4
[β0β, β9β] 5
[β0β, β9β] 2
[β0β, β7β] 6
[β0β, β7β] 1
[β0β, β6β] 7
[β0β, β6β] 2
[β0β, β4β] 8
[β0β, β4β] 2
[β0β, β4β] 9
[β0β, β4β] 3
[β0β, β4β] 10
[β0β, β4β] 4
[β1β, β9β] 21
[β1β, β9β] 3
[β1β, β7β] 22
[β1β, β7β] 2
[β1β, β7β] 23
[β1β, β7β] 3
[β1β, β6β] 24
[β1β, β6β] 3
[β1β, β6β] 25
[β1β, β6β] 4
[β1β, β6β] 26
[β1β, β6β] 5
As you can see, each of the two integers in each line are being separated. If I only print(hour), I get a column of unsorted numbers, however they donβt get separated by commas, neither do they get surrounded by brackets. Iβm trying to sort them as column and put the total number of times each number appears with βcountsβ on the right, as in the answer above.
I think my problem is with lines 14 and 15, itβs clear that this is not the right way to sort a column. I searched the web and found that it is possible to do it with sort_value(), using pandas; but the compiler Iβm using doesnβt allow me to download pandas.
Could someone please clarify how I could sort this list without separating two of each integers and without brackets?
Thank you.