Sorting Overlap Date and Time

cyberblitz · June 24, 2022, 11:11pm

Ok, I have been attempting this problem for 2 solid weeks now and have to admit it’s driving me insane.

Is there a way to sort the below table so that when there is an overlap, the “OverlapIndex” increments, depending on how many overlaps there are in the same AlloIndex group? Now, Overlaps can only occur in the same AlloIndex Group, however, if there are more than one OverLaps in the same group, but the 2 overlap ranges don’t overlap the other overlap ranges, then the OverlapIndex resets to 0. So, if the 1st index in the group does not overlap the 3rd index in the group, then its AlloIndex resets to 0.

Hope this makes sense

Here is the data:

RowText AlloIndex   PosIndex    FromTime            ToTime        OverLapIndex
test1           0                0    21/06/2022 14:42    22/06/2022 2:43     0
test2           0                0    21/06/2022 14:42    22/06/2022 2:43     1
test3           0                0    22/06/2022 2:42     22/06/2022 14:43    0
test4           0                0    22/06/2022 14:42    23/06/2022 2:43     0
test5           5                5    21/06/2022 19:42    22/06/2022 7:43     0
test6           6                6    21/06/2022 16:42    22/06/2022 4:43     0
test7           7                7    21/06/2022 15:42    22/06/2022 3:43     0
test8           7                7    21/06/2022 19:42    22/06/2022 7:43     1
test9           8                8    21/06/2022 21:42    22/06/2022 9:43     0
test10         8                8    21/06/2022 22:42    22/06/2022 10:43    1
test11         9                9    22/06/2022 0:42     22/06/2022 12:43    0
test12         9               9    21/06/2022 15:42    22/06/2022 3:43     1
test13        9                9    22/06/2022 12:42    23/06/2022 12:43    2
test14        9                9    22/06/2022 10:42    23/06/2022 10:43    3
test15       10             10    21/06/2022 20:42    22/06/2022 8:43     0
test16       10             10    22/06/2022 1:42     22/06/2022 13:43    1

The table should look like this:

RowText AlloIndex   PosIndex    FromTime            ToTime        OverLapIndex
test1           0                0    21/06/2022 14:42    22/06/2022 2:43     0
test2           0                0    21/06/2022 14:42    22/06/2022 2:43     1
test3           0                0    22/06/2022 2:42     22/06/2022 14:43    0
test4           0                0    22/06/2022 14:42    23/06/2022 2:43     0
test5           5                5    21/06/2022 19:42    22/06/2022 7:43     0
test6           6                6    21/06/2022 16:42    22/06/2022 4:43     0
test7           7                7    21/06/2022 15:42    22/06/2022 3:43     0
test8           7                7    21/06/2022 19:42    22/06/2022 7:43     1
test9           8                8    21/06/2022 21:42    22/06/2022 9:43     0
test10         8                8    21/06/2022 22:42    22/06/2022 10:43    1
test11         9                9    22/06/2022 0:42     22/06/2022 12:43    0
test12         9               9    21/06/2022 15:42    22/06/2022 3:43     1
test13        9                9    22/06/2022 12:42    23/06/2022 12:43    0
test14        9                9    22/06/2022 10:42    23/06/2022 10:43    1
test15       10             10    21/06/2022 20:42    22/06/2022 8:43     0
test16       10             10    22/06/2022 1:42     22/06/2022 13:43    1

vbrozik · June 25, 2022, 10:07am

There are still many unknowns.

Do you need the result to keep the original sorting order?
To be completely clear: all the overlaps we are talking about are overlaps of the time intervals FromTime - ToTime, right?

Then in processing we should split the data by AlloIndex and process every part independently, right?
Can we assume that AlloIndex is monotonic?
I.e. the following condition is always True? AlloIndex[line] <= AlloIndex[line+1]
It looks like FromTime is monotonic too. I am not yet sure if it helps but can we assume this?

It is becoming complicated and I am not able to make a clear and consistent idea from that.

Here OverLaps is some kind of count/set/list of overlaps. We should define it.
i. We can organize the overlaps into a matrix (2D array) where every interval would have its row and its column.
ii. We will take just a triangular part of the matrix (i.e. ignoring overlaps of a range with itself and overlaps in reverse order - i.e. range1 overlaps range2, range2 overlaps range1).
iii. Is OverLaps the count/set/list of all such overlaps?
“the 2 overlap ranges don’t overlap the other overlap ranges”
i. What is an overlap range? A new range created as the intersection of the overlapping ranges?
ii. You were talking about “more than one” now your are referring just to two. Are these the same?
iii. “the other overlap ranges” - What are these? Overlap ranges in other AlloIndex groups? So my point 3. is invalid?
iv. Or are you saying that the condition is that all the overlaps consist only of pair of ranges?
“the OverlapIndex resets to 0”
i. Are “keeping the original value” and “replacing it with zero” the only operations we are supposed to do with OverlapIndex? It does not look like this. In your example in the line test14 the value of OverLapIndex changed from 3 to 1!
ii. Are changes in the OverlapIndex column the only changes in the data the algorithm should perform?

What are these indices 1st, 3rd? “the group” is still the group of lines with the same AlloIndex? If my point 3. is valid. Let’s forget about other AlloIndex groups in this sub-algorithm description and do not burden our heads with that.
“its AlloIndex resets to 0” - You probably mean OverlapIndex?
As I understood it we do not need the columns RowText and PosIndex for the processing but we need all the other columns, right?
There is a lot of indices in the table and the desctiption: AlloIndex, PosIndex, OverLapIndex, the 1st index in the group, the 3rd index in the group. Maybe we should have some explanation just for all these indices.
It is good that we have an example of input and output data but it would be good to have explanation of few interesting cases of the changes/no changes in the output data.

cyberblitz · June 26, 2022, 12:17am

Ok, I’m going to really simplify this. Concentrating on just one group, group 9, or AlloIndex 9, for example.

Basically, this is for a Gantt-type display, to visualize bed bookings. But there can be more than one booking at the same time, occupying the same row. I’m trying to position the bars in a way where they don’t occupy the same row by themselves if there is no overlap. So instead of taking 4 rows, they only take 2 rows.

Anyway, if you take AlloIndex 9, there are 4 entries: the first two overlap, so one should jump to the next row. The next 2 entries also overlap each other, BUT, they do not overlap the first two, so, they should occupy the same rows as the first two, but further down the Y-axis (this part of the code already works well, so, I’m only interested in the X-axis now). This is where the OverlapIndex comes in because this would position the bars according to their Group Position index. Does this make sense?

vbrozik · June 26, 2022, 8:06am

I am afraid this way it is extremely difficult to help you

We do not see your code you created during the two weeks you mention. Even the simplest part - the parsing of your text table.
We do not know what is the particular problem you are unable to solve.
We do not see answers to any of my questions.
Again we do not see definitions of the terms you are using. For example:
- group 9
- AlloIndex 9
- row
- bars - I think you are referring to the Gantt chart

Draw your example. In your drawing explain what your terms mean. Show the intended result, ideally show interesting corner cases.

Explain what are input data, what are temporary working variables, what are output data. At this moment it is unclear why the input and output data have the same structure.

It can help to simplify the time intervals to integer intervals during the design phase. It is not difficult to replace them later in the code. (In fact POSIX time in seconds is an integer.)

cyberblitz · June 28, 2022, 8:30am

I’m sorry for not explaining it in more detail more, it’s just a bit complicated to explain. However, I’ll begin to answer your questions:

Yes
Yes
Yes
Yes
Yes, but only in the sub-group of the same AlloIndex
The Ovelaps Column is an index of overlaps for each AlloIndex group, so Alloindex 9 would be 0,1,2,3 …normally.
I’m probably not using the right terms here. What I mean is if you have 2 Overlap ranges, like test 11 and test 12, you also have test 13 and test 14, however, in some situations, test 11/12 may not overlap test 13/14, and therefore test 13 should move up an Index, with the series becoming 0,1,0,1.
Yes
As mentioned, the Indices of OverLapIndex would sequentially increase, dependent on how many overlaps there are: so Alloindex 9 would normally be 0,1,2,3 (but the question is how to change this)
Yes, i did. An oversight
Correct
Alloindex represents the actual room the booking is allocated, so bed 1 would be AlloIndex 0, bed 2, would be AlloIndex 1 and so on…
OK, as mentioned, this data is the back-end of a front-end bed booking app, represented by a Gantt-type view. There can be more than one booking for a bed at the same time, and I am trying to organize the data in such a way that rendering the Gantt Bars are orginsed and ordered. Essentially, the PosIndex will be changed to help guide the Y-axis of the Gantt Bar (the posIndex is essentially the index of the row of the Gantt Grid). This is as far as I have got and how I would like it sorted. See below Picture.

GanttBars1506×450 101 KB

vbrozik · June 28, 2022, 2:26pm

OK, thank you for the explanation. I think we managed to extract the core problem by decomposing the complex problem. There was too much at the beginning. Let me just know if I understand the problem correctly:

We got rid of AlloIndex because we need to solve the problem for each value of AlloIndex separately.
The original table showed the data in 1-dimensional representation while the processed data are 2-dimensional. The graphical representation certainly gave us much better idea.
The problem is some kind of interval-packing: Split the list of intervals to certain groups. Inside each group the intervals must not overlap.

Now the unknowns which remain are the criteria for the interval packing into the mentioned groups. Examples of possible criteria/algorithms:

a) Pack the most time possible into the first group. With the remaining intervals do the same for the second group etc… (All regardless the original interval order or their FromTime order) For example in the modified example picture below test17 would replace test16 in the result.
b) Find an optimal packing - pack to the smallest number of groups possible regardless other criteria. (Not sure if the method a) does not give the optimal packing already.) The only unknown remaining would be how to sort the created groups.
c) Take the first interval and put it into a new group. Fill the group selecting most time possible from the remaining intervals. Repeat the same with the remaining intervals. This way the FromTime of the whole groups will remain monotonic as it was in the input data.

Is one of the algorithms the one you want or something else?

cyberblitz · June 28, 2022, 9:53pm

An algorithm would be very useful, yes. So the end result would appear with all bars ‘packed’ against each other, rather than bars taking a row each. No row should have an overlap.Option A seems the way to go. Sorry, i have no example code to share, as i don’t feel it would be very useful as it doesn’t work

mlgtechuser · June 29, 2022, 5:27am

The columnized data below might help. NOTE: TableOUT is just a sample and has ‘<<<<’ at the two merged lines.
I started looking at this but then had to set it aside in favor of billable hours.

tableIN =   '''
            RowText AlloIndex   PosIndex    FromTime            ToTime        OverLapIndex
            test1           0          0    21/06/2022 14:42    22/06/2022 2:43          0
            test2           0          0    21/06/2022 14:42    22/06/2022 2:43          1
            test3           0          0    22/06/2022 2:42     22/06/2022 14:43         0
            test4           0          0    22/06/2022 14:42    23/06/2022 2:43          0
            test5           5          5    21/06/2022 19:42    22/06/2022 7:43          0
            test6           6          6    21/06/2022 16:42    22/06/2022 4:43          0
            test7           7          7    21/06/2022 15:42    22/06/2022 3:43          0
            test8           7          7    21/06/2022 19:42    22/06/2022 7:43          1
            test9           8          8    21/06/2022 21:42    22/06/2022 9:43          0
            test10          8          8    21/06/2022 22:42    22/06/2022 10:43         1
            test11          9          9    22/06/2022 0:42     22/06/2022 12:43         0
            test12          9          9    21/06/2022 15:42    22/06/2022 3:43          1
            test13          9          9    22/06/2022 12:42    23/06/2022 12:43         2 <<<<
            test14          9          9    22/06/2022 10:42    23/06/2022 10:43         3 <<<<
            test15         10         10    21/06/2022 20:42    22/06/2022 8:43          0
            test16         10         10    22/06/2022 1:42     22/06/2022 13:43         1
            '''
tableOUT =  '''
            RowText AlloIndex   PosIndex    FromTime            ToTime        OverLapIndex
            test1           0          0    21/06/2022 14:42    22/06/2022 2:43          0
            test2           0          0    21/06/2022 14:42    22/06/2022 2:43          1
            test3           0          0    22/06/2022 2:42     22/06/2022 14:43         0
            test4           0          0    22/06/2022 14:42    23/06/2022 2:43          0
            test5           5          5    21/06/2022 19:42    22/06/2022 7:43          0
            test6           6          6    21/06/2022 16:42    22/06/2022 4:43          0
            test7           7          7    21/06/2022 15:42    22/06/2022 3:43          0
            test8           7          7    21/06/2022 19:42    22/06/2022 7:43          1
            test9           8          8    21/06/2022 21:42    22/06/2022 9:43          0
            test10          8          8    21/06/2022 22:42    22/06/2022 10:43         1
            test11          9          9    22/06/2022 0:42     22/06/2022 12:43         0
            test12          9          9    21/06/2022 15:42    22/06/2022 3:43          1
            test13          9          9    22/06/2022 12:42    23/06/2022 12:43         0 <<<<
            test14          9          9    22/06/2022 10:42    23/06/2022 10:43         1 <<<<
            test15         10         10    21/06/2022 20:42    22/06/2022 8:43          0
            test16         10         10    22/06/2022 1:42     22/06/2022 13:43         1
            '''

mlgtechuser · July 1, 2022, 9:38pm

Here’s a working solution, @cyberblitz. Since you didn’t say what packages you’re using, if any, and also didn’t say what produces the raw data table in the OP, I only used builtin Python functions and took some liberties to structure the data in a simple form so I could focus on the stacking and concatenating that’s the core of your question.

Restructured Data: <ClickToView>

Padded table with ‘space’ characters to put data into columns (as posted above).
This was necessary because of the space in the dateTime columns, otherwise a simple <string>.split would have broken the contents of each row into separate data elements.
Padded Header row to left-justify each column header at the beginning of the column.
This allows the header row to define the beginning of each data column.

RowText         AlloIndex  PosIndex    FromTime            ToTime                   OverLapIndex
test1           0          0           21/06/2022 14:42    22/06/2022 2:43          0
test2           0          0           21/06/2022 14:42    22/06/2022 2:43          1
test3           0          0           22/06/2022 2:42     22/06/2022 14:43         0
⋮

The core algorithm here just wants to receive a list[] of table rows whose members are a list[] of the values in each row–without the header row.

Preprocessed Data (Prepped for Surgery): <ClickToView>

['test1', '0', '0', '21/06/2022 14:42', '22/06/2022 2:43', '0']
['test2', '0', '0', '21/06/2022 14:42', '22/06/2022 2:43', '1']
['test3', '0', '0', '22/06/2022 2:42', '22/06/2022 14:43', '0']
⋮

The Process

Read the data table into rows (from a file, in this case).
Sort the table by ToTime within each AlloIndex group.
Loop through the table rows.
- compare the current end time with the next start time.
- if: the next booking will fit on the same Gantt chart line, append it to the table as a concatenated item.
- else: increment the AlloIndex and append the row to the table as a "next line’ item (AlloIndex > 0).
Convert the Unix time values back to date strings and put the headers back onto the top row.

import datetime
import time

with open('BookingData.csv', 'r') as fil:
    rawBookings = fil.readlines()

columnNames = rawBookings[0].split()
columns = [rawBookings[0].find(col) for col in columnNames]     #find where headers start
columns.append(len(rawBookings[1]))     #capture end of last column
bookingList = [row.rstrip() for row in rawBookings]     #remove '\n' from each row
bookingList.pop(0)  #get rid of headings

bookingRow = []
bookingTable = []
for row in bookingList:                                                 #loop through the rows
    for idx in range(len(columns)-1):                                   #loop through the columns
        bookingRow.append(row[columns[idx]:columns[idx+1]].rstrip())    #build a list with each column value in the row; remove extra spaces
    bookingTable.append(bookingRow)                                     #build a table of the row lists
    bookingRow = []

for row in bookingTable:
    print(row)

dateFormat = "%d/%m/%Y %H:%M"
for row in bookingTable:    #convert date+time strings to Unix Time (decimal seconds since 1 Jan 1970)
    row[3] = time.mktime(time.strptime(row[3],dateFormat))
    row[4] = time.mktime(time.strptime(row[4],dateFormat))

print("="*100)
print(columnNames)
for row in bookingTable:
    print(row)

currRow = []
alloGroup = []
currAlloIndex = '0'
groupSorted = False
for rowIdx,row in enumerate(bookingTable):          #sort AlloIndex groups by endTime
    if groupSorted and currAlloIndex == row[2]:     #AlloIndex group is sorted; don't sort again
        continue
    groupSorted = False
    currAlloIndex = row[2]
    groupIdx = rowIdx
    while groupIdx != len(bookingTable) and bookingTable[groupIdx][2] == currAlloIndex:  #look through remaining records for this AlloIndex
        alloGroup.append([bookingTable[groupIdx][4],bookingTable[groupIdx]])
        groupIdx += 1
    sortedGroup = sorted(alloGroup, key = lambda x: x[0])
    groupIdx = 0
    for offset,booking in enumerate(sortedGroup):
        bookingTable[rowIdx+offset][5] = 0
        bookingTable[rowIdx+offset] = booking[1]
    alloGroup,sortedGroup = [],[]
    groupSorted = True

Gantt = []
for rowIdx,row in enumerate(bookingTable):
    if not Gantt or row[1] != currAlloIndex:    #if first row of table or first item in AlloIndex group...
        Gantt.append(bookingTable[rowIdx])      #close out Gantt row; append a new row; process next alloIndex
        currAlloIndex = row[1]
        prevEndTime = row[4]
        prevOverlapIdx = 0
        continue
    if row[3] < prevEndTime:    #booking requires new line
        prevOverlapIdx += 1     #set overlapIndex
    else:    #close out Gantt row; reset OverlapIndex
        prevOverlapIdx = 0
    Gantt.append(bookingTable[rowIdx])          #Add row to data table
    bookingTable[rowIdx][5] = prevOverlapIdx    #apply appropriate overlapIndex to table row

for row in Gantt:   #convert Unix Time to date strings
    row[3] = datetime.datetime.fromtimestamp(row[3]).strftime('%d/%m/%Y %H:%M')
    row[4] = datetime.datetime.fromtimestamp(row[4]).strftime('%d/%m/%Y %H:%M')

Gantt.insert(0,rawBookings[0].rstrip())     #add the headers back in

print("="*100)
for row in Gantt:
    print(row)

[details="Output Data (manually columnized) ]

RowText     AlloIndex  PosIndex  FromTime             ToTime               OverLapIndex
['test1',   '0',       '0',      '21/06/2022 14:42',  '22/06/2022 02:43',  0]
['test2',   '0',       '0',      '21/06/2022 14:42',  '22/06/2022 02:43',  1]
['test3',   '0',       '0',      '22/06/2022 02:42',  '22/06/2022 14:43',  2]
['test4',   '0',       '0',      '22/06/2022 14:42',  '23/06/2022 02:43',  0]
['test5',   '5',       '5',      '21/06/2022 19:42',  '22/06/2022 07:43',  0]
['test6',   '6',       '6',      '21/06/2022 16:42',  '22/06/2022 04:43',  0]
['test7',   '7',       '7',      '21/06/2022 15:42',  '22/06/2022 03:43',  0]
['test8',   '7',       '7',      '21/06/2022 19:42',  '22/06/2022 07:43',  1]
['test9',   '8',       '8',      '21/06/2022 21:42',  '22/06/2022 09:43',  0]
['test10',  '8',       '8',      '21/06/2022 22:42',  '22/06/2022 10:43',  1]
['test12',  '9',       '9',      '21/06/2022 15:42',  '22/06/2022 03:43',  0]
['test11',  '9',       '9',      '22/06/2022 00:42',  '22/06/2022 12:43',  1]
['test14',  '9',       '9',      '22/06/2022 10:42',  '23/06/2022 10:43',  0]
['test13',  '9',       '9',      '22/06/2022 12:42',  '23/06/2022 12:43',  0]
['test15',  '10',      '10',     '21/06/2022 20:42',  '22/06/2022 08:43',  0]
['test16',  '10',      '10',     '22/06/2022 01:42',  '22/06/2022 13:43',  1]

[/details]

This is a relatively rudimentary algorithm and won’t optimize every case. For example, another search loop can be added to optimize the concatenation onto each Gantt row based on the duration of each booking.

I assume you have a function to render the Gantt chart…something that converts the bookings table into the Gantt bars. If so, please paste the end result!

[EDIT: ] I forgot to paste the output data.