I have to write the following programme:
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ’ line by finding the time and then splitting the string a second time using a colon.
From stephen.marquard@uct.ac.za Sat Jan 5 09 :14:16 2008
Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.
I’ve written this:
fname = input(“Enter file:”)
carpeta=open(fname)
d=dict()
You can write:
d = {}
for this if you prefer.
for line in carpeta:
if line.startswith ("From "):
words=line.split()
for word in words:
if “:” in word:
The mbox From_ line has a very rigid format. You could just reach for
words[4] directly.
word1=word[0:1]
I’d print word1 here. I do not think it contains what you want.
You might be best calling split again, to split on the colon on the
specific word, eg time_parts = words[4].split(':')
. Then convert
time_parts[0]
into an int
. Or leave it alone as a string, but we
tend to consider hours to be integers.
if word1 not in d:
d[word1]=1
else:
d[word1]=d[word1]+1
Looks ok.
res = list()
for word in d:
res.append(word)
This iterates over the keys of d, which are supposed to be hours. Use
the name “hour” instead of “word”, it makes things more clear.
You can get the keys of a dict via a method, or you can just use that
fact that iterating over a dict gives you the keys (which you’re using
for the loop above anyway). The list()
constructor accepts any
iterable, so this:
res = list(d)
gets you a list of the keys.
print(res.sort())
Personally I’d write:
print(sorted(res))
The expression res.sort()
actualy sorts the list in place i.e. it has
side effects. You might want that, but if you don’t I tend to avoid the
side effect.
The desired output is:
04 3
06 1
07 1
09 2
10 3
Your print just prints the hours. If you want counts you should iterate
over the dict.items()
, eg:
for hour, count in d.items():
print(hour, count)
I notice that there are leading zeroes in the desired output above. That
suggests you should be treating the hours as strings, and not converting
them to ints.
But my programme does not work. I’ve tried to use a dictionary, as I want the hour as key and the number of times as value but then I need to use a list to use “sort”? How can I put the words in my dictionary into the list? And the loop I’ve written I think it does not get the hour…
I’m pretty sure your word1=word[0:1]
is producing an incorrect result.
Print it out! See what it does.
Cheers,
Cameron Simpson cs@cskk.id.au