Median value problem

I have the following assignment:

" Write a program that prompts for a file name, then opens that file and reads through the file, looking for lines of the form: X-DSPAM-Confidence: 0.8475 Count these lines and extract the floating point values from each of the lines and compute the average of those values and produce an output as shown below. Do not use the sum() function or a variable named sum in your solution. You can download the sample data at http://www.py4e.com/code3/mbox-short.txt when you are testing below enter mbox-short.txt as the file name."

The output must be: “Average spam confidence: 0.7507185185185187”

The code I’m writing is the following:

import statistics
fname = input("Enter file name: ")
fh = open(fname)
for line in fh:

    if not line.startswith("X-DSPAM-Confidence:"):
        continue
    v = (line[20:26])
    value = float(v)
    med = statistics.median([value])
    print(med)

What this program should do is download a list of values and calculate the mean.

But I’m getting the error: “ImportError: No module named statistics on line 4”

Could someone help, please.

Thanks.

I suspect that you mistyped import statistics. statistics is in line 10, not 4, so what you posted is not what you ran. The assignment is for the mean, not the median, and since sum() is not allowed, I suspect that you are not supposed to use the statistics module, and probably not math.fsum. There are other mistakes.

“import statistics” is in line 4 of the original code.

What your scrip is doing, is not what you want it to do: there is no difference between value and med, because med has just one value, so what it is doing is to simply display 27 values.

As is, I can’t reproduce your error and my system imports the statistics module.

The above aside, you should use a file handler, so that the file is close, once the script exits:

import statistics

fname = input("Enter file name: ")
with open(fname) as fh: # fh is the file handler
    for line in fh:
        if not line.startswith("X-DSPAM-Confidence:"):
            continue
        v = (line[20:26])
        value = float(v)
        med = statistics.median([value])
        print(med)

I’ve not corrected anything other than the file issue, but if you want the full solution, I can post it for you. That said, you’d learn more by doing this for yourself.

Hints:

You don’t need the statistics module, or any other module.
You could use a list object to hold all the values.

I found another way to calculate the median. Now the output is a list of accumulated median values, of which I want to print just the last one:

My doubt now is how to extract only the last value (0.7507185185185187) and print it.

Thanks.

fname = input("Enter file name: ")
fh = open(fname)
i=0
tot = 0
for line in fh:
if not line.startswith(“X-DSPAM-Confidence:”):
continue
v = (line[20:26])
value = float(v)
i=i+1
tot = tot + value
av = (float(tot/i))
print(av)

Yeah, that works.

Of note: if you insist on not using a file handler, then you should close the file with fh.close(), but then the name fh becomes meaningless, so maybe use the name file instead.

You can print your result with print(f"Average spam confidence: {av}")

Your code with some small change and including the close file:

fname = input("Enter file name: ")
file = open(fname)
i = 0
tot = 0
for line in file:
    if not line.startswith("X-DSPAM-Confidence:"):
        continue
   value = float(line[20:])
    i += 1
    tot += value
file.close()

av = float(tot / i)
print(f"Average spam confidence: {av}")

Some examples of alternative ways to do this:

xdspam = [] # list object to hold the values

fname = input("Enter file name: ")
with open(fname, encoding='UTF-8') as fh:  # fh is the file handler
    for line in fh:
        if not line.startswith("X-DSPAM-Confidence:"):
            continue
        value = (line[20:])  # be descriptive when naming objects
        xdspam.append(float(value))

SPAM_VALUE = 0
for value in xdspam:
    SPAM_VALUE += value
print(f"Average spam confidence: {SPAM_VALUE / len(xdspam)}")

print()
# this does the same as the above using the sum() function
print(f"Average spam confidence: {sum(xdspam) / len(xdspam)}")

# if you're not allowed to use the len() function
print()
SPAM_VALUE = 0
for count, value in enumerate(xdspam, 1):  # without the one, count would start at zero
    SPAM_VALUE += value

AVERAGE_SPAM_CONFIDENCE = SPAM_VALUE / count
print(f"Average spam confidence: {AVERAGE_SPAM_CONFIDENCE}")

There are other ways also, but one has to draw the line somewhere.

Is a beginner, you’ve made a good start as well as some rookie errors, but we’ve all been there and had to learn from the mistakes we’ve made.

Just to add: the reason that I’d use a list object for this kind a project is that if the data is stored in a list object, it’s then very easy to get access to said data without having to again open the file and read in the data. Not a huge issue with this small project, but could be worth remembering for larger ones.

Happy coding.

The statistics module appeared in Python 3.4. Could you have a old
Python?

Cheers,
Cameron Simpson cs@cskk.id.au

Thanks very much!!

That’s not the median. You are computing some sort of running mean, which is not the same as the median.

The statistics module contains a function to compute the median. You should use that.

import statistics  # Put this on your first line.

fname = input("Enter file name: ")
with open(fname) as fh:
    data = []  # Collect the data we want to use.
    for line in fh:
        if not line.startswith("X-DSPAM-Confidence:"):
            continue
        v = line[20:26]  # Extract the six characters of data.
        value = float(v)
        data.append(value)

# Now we can calculate whatever statistics are needed.

print("Median:", statistics.median(data))
print("Mean (average):", statistics.mean(data))
print("Standard deviation:", statistics.stdev(data))

You’re welcome.