Duplicate values

arvin_90 · May 25, 2022, 7:02am

Drop duplicates in on column but conserve data in other columns

     A       B     C

0 a1 0 0
1 a1 1 0
2 a1 1 2
3 a4 2 4
4 a5 1 4

This is i want

    A         B     C

0 a1 0 0
1 1 0
2 1 2
3 a4 2 4
4 a5 1 4

mlgtechuser · May 25, 2022, 8:21am

Hello Arvin.

Is this school work?

If it is school work, then you are much better served to find the answer yourself. You will not learn as much from other people’s answers.

For example: What function(s) would you use?

Are you required to use one specific function that you are studying?

cameron · May 25, 2022, 10:30pm

How are these data stored? I would guess a DataFrame from the output,
but you haven’t said.

What have you tried so far to solve this problem?

Cheers,
Cameron Simpson cs@cskk.id.au

arvin_90 · May 26, 2022, 4:23am

Yes it is in Dataframe…
Basically it the ledger entries dataset and after doing some brainstorming i managed to get debit and credit entries in one row with that one problem arise and im not able to resolve that…
The problem is that in debit column duplicate value are coming and its because of credit column value is are more than debit column…

           Entry     Debit       Credit

101 abc 1000 200
101 abc 1000 500
101 abc 1000 300

And i want in this format

           Entry     Debit       Credit

101 abc 1000 200
101 500
101 300

cameron · May 26, 2022, 8:57am

By Arvin via Discussions on Python.org at 26May2022 04:34:

Yes it is in Dataframe…
Basically it the ledger entries dataset and after doing some brainstorming i managed to get debit and credit entries in one row with that one problem arise and im not able to resolve that…

Can you show the code which did this for you?

Also, can you show the original dataset?

The problem is that in debit column duplicate value are coming and its because of credit column value is are more than debit column…

How do you know that?

          Entry     Debit       Credit
101 abc 1000 200
101 abc 1000 500
101 abc 1000 300

And i want in this format
          Entry     Debit       Credit
101 abc 1000 200
101 500
101 300

My DataFrame abilities are weak, but in theory you could write a loop
which iterated over the frame rows, with a reference to the previous row
to hand. If the entry in the current row matches the previous row, you
could scrub the current row’s value. Something a bit like this:

prev_row = None
for row in df:
    if prev_row is not None and prev_row['Entry'] == row['Entry']:
        row['Entry'] = ''
    prev_row = row

Note: totally untested.

Cheers,
Cameron Simpson cs@cskk.id.au

arvin_90 · May 26, 2022, 1:29pm

This is original dataset

arvin_90 · May 26, 2022, 1:30pm

This encircle values are repeating which i dont want

rob42 · May 26, 2022, 3:14pm

As OPs have asked (3 times I think): what code have you got so far?

arvin_90 · May 27, 2022, 5:24am

Firstly i create new column “uniquevalue”
df[‘uniquevalue’] = df[‘BranchID’].map(str) + df[‘VoucherType’].map(str) + df[‘JournalDate’].map(str) + df[‘VoucherNo’].map(str)

then create new dataframe
df_new = df.filter([‘uniquevalue’, ‘VoucherLedger’, ‘DebitAmount’, ‘CreditAmount’])

debit = df_new[df_new[“DebitAmount”] > 0]
debit.rename(columns = {‘VoucherLedger’:‘DebitVoucher’}, inplace = True)
del debit[‘CreditAmount’]

credit = df_new[df_new[“CreditAmount”] > 0]
credit. rename(columns = {‘VoucherLedger’:‘CreditVoucher’}, inplace = True)
del credit[‘DebitAmount’]

dr_cr = pd.DataFrame.merge(debit, credit, on = ‘uniquevalue’)

mlgtechuser · May 27, 2022, 12:35pm

You do not need to delete any values, Arvin. In fact you should not delete any values because these are financial data records. Even if you’re only trying to produce a report, that report will probably have errors if you modify the data or the output values just to make it “look right”. Find the root cause instead of masking the actual bug by patching the output.

The specific problem is that the CreditAmount value doesn’t match the DebitAmount value in the rows after the first ‘Cash’ transaction.

It looks like something is making your CreditAmount pointer stop indexing. However, your dataset is so big that your output only shows the Head and Tail. NEXT STEP: Find the point where the CreditAmount value begins repeating.

THEN: Since your dataset has 91531 entries, be a friend to yourself and work with a smaller dataset. You should be able to test with a dataset having only two or three entries of each CreditVoucher type. See if you can make the bug happen with a much smaller dataset. Be sure to include the CreditVoucher types right before and after the point where the repeating begins.

arvin_90 · May 27, 2022, 6:07pm

Yes, ur are right my dataset is too big it has around 250000 rows. This is just one csv file data.

I checked everything and it is perfectly fine. The problem arises when the amount gets split.
For eg:
this is the original data
123

arvin_90 · May 27, 2022, 6:08pm

this is when i merge dataframe
1234

so I don’t want 5000 to getrepeat.

I hope I am able to explain you my problem.

mlgtechuser · May 28, 2022, 9:08am

The [repeating value] problem arises when the amount gets split.

This is a much better definition of the problem, Arvin!

Pandas is a Python package, but your question here is probably about Pandas instead of Python.

The first question to answer is…
“Can Pandas do accounting transaction splits? Or do I need custom Python code for these splits?”

You will probably get better answers in a Pandas forum.

Post your question there as “Accounting Transaction Split”. This will attract the attention of people with accounting knowledge who will understand your application.

Your bug is “When I split a transaction, the source value (the Total value for the transaction) is repeated for each split. I need the source value to be zero on each split transaction line.” Look up each word in this quote and make sure you fully understand every word.

Make a simple dataset (with only the necessary columns) and post no more than 6 lines of the DATA over to that forum as TEXT characters, not screenshots. You can test a split with two lines. Post no more than three.

Post only the code that performs the split plus any code that allows a reader to understand the setup for the split. If this seems too difficult, then complete a simpler project before continuing with your accounting application.

Best luck to you!
-Leland

Topic		Replies	Views
Creating new columns with conditions Python Help	7	319	May 10, 2022
how to replace the duplicate numbers in column " received " in the attached sample with value zero but keeping the last or first value table using Python Python Help help	2	372	August 10, 2022
Replace the value row into the previous row Python Help help	3	957	March 17, 2023
Counting Duplicates and Getting Sample Values Python Help help	1	406	July 3, 2022
Add all loop values into dictionary Python Help help	3	311	June 23, 2023

Duplicate values

Related Topics