How to remove repeated tensor values from a list

elenora · July 15, 2024, 7:37pm

Hello. I have a list of float values like the following which are ids in my data and in the dataset they are originally like 11.0,11.2,16.2,… but when I try to put them in a Data object like the following:

data_with_id = Data(x=x_without_id, edge_index=edge_index.t().contiguous(),edge_weight=edge_feats,y=ground_truth_labels)
        data_with_id.id_column = torch.tensor(id_column, dtype=torch.float)

I see torch.tensor changes the format of them to be like the following:

unique_values: {tensor(11.), tensor(11.2000), tensor(16.2000), tensor(6.1000), tensor(5006.), tensor(17.), tensor(13.2000), tensor(19.1000), tensor(2.), tensor(9.), tensor(5010.), tensor(6.), tensor(14.), tensor(14.1000), tensor(5001.), tensor(5.1000), tensor(6.2000), tensor(5001.), tensor(4.2000), tensor(11.1000), tensor(12.), tensor(5007.), tensor(17.1000), tensor(17.), tensor(4.1000), tensor(8.2000),tensor(19.1000)}

I tried to put them in a set to remove repeated values but still I see repeated values in it. Can anyone suggest me how can I handle it?

Here is the code I have tried so far:

        missing_info = []
        unique_values = set()
 
        for i in range(start_timestep-1, start_timestep+ timestep-1):
                if sequence[i].x.size(0) > 0:
                    np.set_printoptions(suppress=True)
                    id_column_tensor = torch.tensor(sequence[i].id_column)
                    

                    id_column_list = [float(f"{value:.4f}") if isinstance(value, float) else value for value in id_column_tensor]
                    unique_values.update(id_column_list)
                    
                else:
                    raise ValueError(f"The tensor at sequence[{i}].x is empty.")
   
        print('unique_values:',unique_values)

I have also a list and I want to check if any of the values within that list is not in unique_values, here is what I have tried but I see this code finds all the values are not in unique_values which is wrong!

 for i in range(start_timestep-1, start_timestep+ timestep-1):
                np.set_printoptions(suppress=True)
                id_column_tensor = torch.tensor(sequence[i].id_column)
                list_tensor = torch.tensor([float(f"{value:.4f}") if isinstance(value, float) else value for value in id_column_tensor])
                for j in unique_values:
                    if not torch.isin(torch.tensor(j, dtype=list_tensor.dtype), list_tensor):    
                        missing_info.append(j)

brass75 · July 15, 2024, 7:43pm

Can you use strings instead of floats? So '11.0' instead of 11.0?

elenora · July 15, 2024, 8:41pm

Thank you for your reply. I did str:

unique_values = set()
 
        for i in range(start_timestep-1, start_timestep+ timestep-1):
                if sequence[i].x.size(0) > 0:
                    np.set_printoptions(suppress=True)
                    id_column_tensor = torch.tensor(sequence[i].id_column)
                    
                    id_column_list = [str(value.item()) if torch.is_tensor(value) else str(value) for value in id_column_tensor]
                    # id_column_list = [float(f"{value:.4f}") if isinstance(value, float) else value for value in id_column_tensor]
                    unique_values.update(id_column_list)
                    
                else:
                    raise ValueError(f"The tensor at sequence[{i}].x is empty.")
   
        print('unique_values:',unique_values)

but I see the values change to like the following (maybe scientific values):
'3.200000047683716', '17.0', '3.0', '4.0', '6.0', '3.0999999046325684', '7.0', '18.100000381469727', '18.200000762939453', '1.0', '5001.0', '12.100000381469727', '11.199999809265137', '16.100000381469727', '5008.0', '5007.0', '12.0', '5010.0', '9.199999809265137', '6.099999904632568', '5005.0', '8.100000381469727', '14.100000381469727', '11.0', '9.0', '5009.0', '14.0', '6.199999809265137', '8.0', '5003.0', '18.0', '13.100000381469727', '15.0', '11.100000381469727', '5004.0', '13.0', '17.100000381469727', '19.0', '8.199999809265137', '5006.0', '4.099999904632568', ...

brass75 · July 16, 2024, 1:18pm

That has more to do with the precision of floating point values than anything else. It might make sense to reach out to the maintainers of the package you are using and see if they can suggest anything.

elenora · July 16, 2024, 5:53pm

Thank you for your answer. And How Can I see if a tensor value exists in a set of tensor values or not. And Why do I see repeated values in unique_values which is a set? How can I remove repeated values in it?

brass75 · July 16, 2024, 6:54pm

Are you seeing identical values in the set or are you seeing near identical values that might have come from very close float values?

elenora · July 17, 2024, 12:13am

Yes I see identical values in the set.