Calculate Area Under Curve Given a Set of Coordinates

Knox · March 6, 2023, 7:32pm

Hi,

New coder here, so please forgive any rookie mistakes.

I have a data set in Excel consisting of several hundred x and y values based on experimental data. There is no associated function for the resulting curve. The x values do not have a fixed interval (e.g. the x values look like 1, 3, 9, 11, 20, etc; no fixed interval).

I need to calculate the Area Under Curve (AUC) for this dataset.

Based on my internet research I think I can use openpyxl to calculate the AUC. However, I’m not sure how to code it. I found a post on another forum that discusses how to calculate the AUC when the x values have a constant step, but I can’t find anything on how to calculate AUC when the x values don’t have a constant step and when there is no associated function.

I can plot the graph using Excel, but I can’t figure out how to calculate the AUC using Python.

Can anyone help me with the code to import my Excel spreadsheet of X and Y values into Python, and then how to calculate the AUC?

Thanks!

Knox · March 6, 2023, 11:40pm

Thank you Darius for your help and all the links! I will review all of the links.

Here are some screenshots of various rows of my data (I can only add one image as a new user, I may have to make several posts):

Knox · March 6, 2023, 11:41pm

Additional image:

Screenshot 2023-03-06 183655

Knox · March 6, 2023, 11:41pm

additional image:

Screenshot 2023-03-06 183712

darius · March 7, 2023, 1:04am

Could you turn your rows into X - Y charts ?

I am sorry, but my answer to you turned hidden, so my job is lost

I can try to send you private message, if supported.

Post hidden by community flags

system

44m

Hello,

This is an automated message from Discussions on Python.org to let you know that your post was hidden.

https://discuss.python.org/t/calculate-area-under-curve-given-a-set-of-coordinates/24540/2

Your post was flagged as spam: the community feels it is an advertisement, something that is overly promotional in nature instead of being useful or relevant to the topic as expected.

This post was hidden due to flags from the community, so please consider how you might revise your post to reflect their feedback. You can edit your post after 10 minutes, and it will be automatically unhidden.

However, if the post is hidden by the community a second time, it will remain hidden until handled by staff.

For additional guidance, please refer to our community guidelines.

steven.daprano · March 7, 2023, 1:20am

I’m not sure how to get the data points out of Excel ibto Python:

there are Excel to Python libraries, but I’ve never used them;
or you can convert the spreadsheet to CSV and then use the csv module to import the data.

Let us know if you need more help with this step.

Once you have your data available to Python, put the x and y values in
two lists:

xs = [ ... ]  # x values
ys = [ ... ]  # y values
assert len(x) == len(y)

If the two sets of data aren’t the same length, you have to decide how to even them out. That might mean truncating the x values, or adding zeroes to the y values.

At this point we can work out an area under the curve, using numerical integration and an approximation to the hypothetical curve.

Model the curve by a series of rectangles;
Model the curve by a series of trapezoids;
Simpson’s rule;
or something more complicated.

There are three ways to do rectangular steps, but we can only use the left and right methods, not the midpoint method. There are two versions of Simpson’s rule, but both are a little complex so I’m going to skip them because I’m lazy.

This is a little easier than regular integration, because we don’t have a choice about the step size: we have to use the given points. So on that note, here are three methods for you to try:

from itertools import pairwise

xs = [1.0, 1.5, 2.0, 2.25, 2.5, 2.75, 3.75, 4.0, 4.25, 5.5]
ys = [3, 4, 7, 9, 6, 4, 5, 7, 9, 8]
assert len(xs) == len(ys)

def area_left(xs, ys):
    area = 0
    for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
        h = bx - ax
        area += h*ay
    return area

def area_right(xs, ys):
    area = 0
    for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
        h = bx - ax
        area += h*by
    return area

def area_trapezoid(xs, ys):
    area = 0
    for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
        h = bx - ax
        area += h*(ay + by)/2
    return area

assert area_left(xs, ys) == 27.25
assert area_right(xs, ys) == 29.25
assert area_trapezoid(xs, ys) == 28.25

(Note: in general the trapezoidal method will be more accurate, unless the data points genuinely come from some sort of step function.)

Knox · March 7, 2023, 1:22am

Here’s an X-Y chart.

Screenshot 2023-03-06 202053

I can actually still see your post; maybe it remains visible to me because I’m the original poster.

Knox · March 7, 2023, 1:30am

Thanks Steven!

Yes, I can convert the Excel to CSV.

I tried to go to my Command Prompt (Windows) and “pip install csv module”, but I received the error below:

pip install csv module
File “”, line 1
pip install csv module
^^^^^^^
SyntaxError: invalid syntax

Once I install the csv module, I’m not sure how to import the data into Python, so any suggestions there would be appreciated.

Once I have the data imported I will try the three methods you outlined.

Thanks again, this is great!

Knox · March 7, 2023, 1:36am

Replying to my own post: I entered import CSV and I think it worked to install the module. Not sure how I would know if it installed or not.

darius · March 7, 2023, 1:37am

Native EXCEL supports charts

How to Make a Chart in Excel

Step 1: Select Chart Type Once your data is highlighted in the Workbook, click the Insert tab on the top banner. …
Step 2: Create Your Chart From the Insert tab, click the column chart icon and select Clustered Column. …

(don’t give web links from Bing not to get my answer hidden again)

Knox · March 7, 2023, 2:07am

I believe I imported the CSV file using

import csv
with open(“suppliers.csv”, “r”) as f:
contents = csv.reader(f)
for c in contents:
print(c)

Knox · March 7, 2023, 2:07am

Yes, please see the associated chart above

Knox · March 7, 2023, 2:25am

I tried using the trapezoid code from Steven above, but I received the error below:

def area_trapezoid(xs, ys):
… area = 0
… for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
… h = bx - ax
… area ± h*(ay + by)/2
… return area
… assert area_trapzoid(xs, ys) ==
File “”, line 7
assert area_trapzoid(xs, ys) ==

I believe I skipped this step:

xs = [ … ] # x values
ys = [ … ] # y values
assert len(x) == len(y)

can you further explain the code above? do I just type the code you have there exactly as you have it written?

Regarding this: If the two sets of data aren’t the same length, you have to decide how to even them out. That might mean truncating the x values, or adding zeroes to the y values.

Do you just mean that I need to assure that for every x there is a y, or do you mean the values themselves have to have the same number of digits?

Thanks again! I think I’m getting close!

steven.daprano · March 7, 2023, 3:30am

You don’t need to install the csv module, it is a standard module that comes pre-installed with Python.

pip is not a Python module as such (well, it kinda is, but let’s not get bogged down in technicalities). It is an application (“app”) or tool that is written in Python, but you have to call it from your operating system’s command prompt, not the Python interpreter.

You don’t use pip from inside the Python interpreter: if you see the >>> prompt, you are inside the interpreter, also called the Python REPL (Read Evaluate Print Loop).

You need to call pip from your OS’s command prompt, also called a “shell” or “xterm” or “terminal” or “console”. Typically it will have a prompt that looks like $ or %.

(Windows users may know it as command.com or cmd.exe, I think. Hey Windows folks, what is the preferred shell these days? Powershell?)

You can call pip install SomePackageName but we recommend that you instead use python -m pip install SomePackageName instead, as that is more reliable when you have more than one Python interpreter installed.

steven.daprano · March 7, 2023, 4:20am

Hi Nick,

You cut off the most important part of the error, namely, the actual error itself.

There is no need to retype the functions, that is just likely to introduce errors or typos, including leaving out blank lines. Instead, you should copy and paste it from my post, preferably into a .py file for future reuse.

You don’t need the three lines:


xs = [1.0, 1.5, blah blah blah]

ys = [3, 4, blah blah blah]

assert len(xs) == len(ys)

as they are just some demonstration data. Likewise, at the very end, you don’t need the three assertions, as they are just a demonstration:


assert area_left(xs, ys) == 27.25

assert area_right(xs, ys) == 29.25

assert area_trapezoid(xs, ys) == 28.25

Those results – 27.25, 29.25 and 28.25 – only apply to the fake demonstration data I invented at the beginning. So you don’t need them, except to verify that (1) the code does what I say it does, and (2) you have copied everything correctly.

There is no harm in leaving them in, if you wish, so long as you copy them correctly and don’t introduce any errors.

The parts you really need are the import at the beginning, and the three functions. (Make sure you leave a blank line between them.) Once you have the three functions, read your data values from the CSV. You need the x and y values to be in separate lists. I called them xs and ys but you can call them anything you like.

Let’s suppose you call them “xdata” and “ydata”. Then you can calculate the area like this:


result = area_trapezoid(xdata, ydata)

print(result)

And similar for the other two area functions.

You need there to be a y for every x. The number of digits is irrelevant.

Knox · March 7, 2023, 12:11pm

thx! will give it another try.

Knox · March 7, 2023, 12:32pm

Ok I did import the csv module and I was able to open my file (“shen.csv”)

Putting it all together, would it look like this:

import csv
with open(“shen.csv”,“r”) as f:
… contents=csv.reader(f)
def area_left(xs, ys):
area = 0
for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
h = bx - ax
area += hay
return area
def area_right(xs, ys):
area = 0
for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
h = bx - ax
area += hby
return area
def area_trapezoid(xs, ys):
area = 0
for (ax, ay), (bx, by) in pairwise(zip(xs, ys)):
h = bx - ax
area += h*(ay + by)/2
return area
result = area_trapezoid(xdata, ydata)

print(result)