Lists, dictionaries,tuple, sort,

Hi!
I have to write the following programme:
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ’ line by finding the time and then splitting the string a second time using a colon.

From stephen.marquard@uct.ac.za Sat Jan 5 09 :14:16 2008

Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.

I’ve written this:
fname = input(“Enter file:”)
carpeta=open(fname)
d=dict()
for line in carpeta:
if line.startswith ("From "):
words=line.split()
for word in words:
if “:” in word:
word1=word[0:1]
if word1 not in d:
d[word1]=1
else:
d[word1]=d[word1]+1
res = list()
for word in d:
res.append(word)

print(res.sort())

The desired output is:
04 3
06 1
07 1
09 2
10 3

But my programme does not work. I’ve tried to use a dictionary, as I want the hour as key and the number of times as value but then I need to use a list to use “sort”? How can I put the words in my dictionary into the list? And the loop I’ve written I think it does not get the hour…

I have to write the following programme:
Write a program to read through the mbox-short.txt and figure out the distribution by hour of the day for each of the messages. You can pull the hour out from the 'From ’ line by finding the time and then splitting the string a second time using a colon.

From stephen.marquard@uct.ac.za Sat Jan 5 09 :14:16 2008

Once you have accumulated the counts for each hour, print out the counts, sorted by hour as shown below.

I’ve written this:
fname = input(“Enter file:”)
carpeta=open(fname)
d=dict()

You can write:

d = {}

for this if you prefer.

for line in carpeta:
if line.startswith ("From "):
words=line.split()
for word in words:
if “:” in word:

The mbox From_ line has a very rigid format. You could just reach for
words[4] directly.

           word1=word[0:1]

I’d print word1 here. I do not think it contains what you want.

You might be best calling split again, to split on the colon on the
specific word, eg time_parts = words[4].split(':'). Then convert
time_parts[0] into an int. Or leave it alone as a string, but we
tend to consider hours to be integers.

           if word1 not in d:
               d[word1]=1
           else:
               d[word1]=d[word1]+1

Looks ok.

res = list()
for word in d:
res.append(word)

This iterates over the keys of d, which are supposed to be hours. Use
the name “hour” instead of “word”, it makes things more clear.

You can get the keys of a dict via a method, or you can just use that
fact that iterating over a dict gives you the keys (which you’re using
for the loop above anyway). The list() constructor accepts any
iterable, so this:

res = list(d)

gets you a list of the keys.

print(res.sort())

Personally I’d write:

print(sorted(res))

The expression res.sort() actualy sorts the list in place i.e. it has
side effects. You might want that, but if you don’t I tend to avoid the
side effect.

The desired output is:
04 3
06 1
07 1
09 2
10 3

Your print just prints the hours. If you want counts you should iterate
over the dict.items(), eg:

for hour, count in d.items():
    print(hour, count)

I notice that there are leading zeroes in the desired output above. That
suggests you should be treating the hours as strings, and not converting
them to ints.

But my programme does not work. I’ve tried to use a dictionary, as I want the hour as key and the number of times as value but then I need to use a list to use “sort”? How can I put the words in my dictionary into the list? And the loop I’ve written I think it does not get the hour…

I’m pretty sure your word1=word[0:1] is producing an incorrect result.
Print it out! See what it does.

Cheers,
Cameron Simpson cs@cskk.id.au

Hi again,
I’ve tried it in several ways and it does not work. my last try:

fname = input(“Enter file:”)
carpeta=open(fname)
d={}

for line in carpeta:
if line.startswith ("From "):
words=line.split()
for word in words:

    	hora=words[5].split(':')
    	
            
    if hora not in d:
       	d[hora]=1
    
    else:
       	d[hora]=d[hora]+1

lst = list(d.keys())

lst.sort()
for key in lst:
print(key, d[key])

This way prints nothing.
Thank you for your help!

Put some print()s in the for-loop, particularly in the stuff parsing the
words. See what they say. It should be evident that things are not right
somewhere.

Cheers,
Cameron Simpson cs@cskk.id.au

Start by checking whether the file actually does have any lines
beginning with “From”:

for line in carpeta:
    if line.startswith ("From "):
        print("found a From line")

If that never prints, then you don’t actually have any lines starting
with From, and your dict will be empty.

If your file is short enough that you aren’t going to be drowned with
thousands of lines of output, you could also do this:

for line in carpeta:
    print(len(line), line[:20])
    if line.startswith ("From "):
        print("found a From line")

which will print the length of the line and the first twenty characters
of it.

Also, look at this bit of code:

words=line.split()
for word in words:
    hora=words[5].split(':')
    if hora not in d:

So you take a line that starts with “From”, and you split it into words.
Then you look at each word individually, and each time you get the
fifth word and split it on the colon. That gives you a list:

>>> '12:34'.split(':')
['12', '34']

What do you do with that list? You look to see if it is in the
dictionary d. It will never be in the dict, it cannot be in the dict.

So there’s one problem right there.

I’ve done this and it works now. However I do not understand the expression d[t1]=d.get(t1,0)+1
I mean, I am telling to get the value of the variable t1 and adding it into the dictionary d. But what about the “0” and the “1”? Is this the value of the key? How?

fname = input(“Enter file:”)
fopen=open(fname, “r”)
d={}

for line in fopen:
if line.startswith (“From “):
spline=line.split()
time=spline[5]
tsplit=time.split(”:”)
t1=tsplit[0]
d[t1]=d.get(t1,0)+1 # Using this line we created a dictionary having keys and values

lst=
for k1,v1 in d.items():
lst.append((k1,v1))
lst.sort()

for k1,v1 in lst: # we are able to access this list using key value pair
print(k1,v1)

Hi Izan,

The line:

d[t1] = d.get(t1, 0) + 1

Says:

  • Look up the key t1 in the dict, and use its value
  • if the key is missing, use 0 as the value instead
  • add 1 to the value
  • and store it back in the dict.