Duplicate values

Drop duplicates in on column but conserve data in other columns

     A       B     C

0 a1 0 0
1 a1 1 0
2 a1 1 2
3 a4 2 4
4 a5 1 4

This is i want

    A         B     C

0 a1 0 0
1 1 0
2 1 2
3 a4 2 4
4 a5 1 4

Hello Arvin.

Is this school work?

If it is school work, then you are much better served to find the answer yourself. You will not learn as much from other people’s answers.

For example: What function(s) would you use?

Are you required to use one specific function that you are studying?

1 Like

How are these data stored? I would guess a DataFrame from the output,
but you haven’t said.

What have you tried so far to solve this problem?

Cheers,
Cameron Simpson cs@cskk.id.au

Yes it is in Dataframe…
Basically it the ledger entries dataset and after doing some brainstorming i managed to get debit and credit entries in one row with that one problem arise and im not able to resolve that…
The problem is that in debit column duplicate value are coming and its because of credit column value is are more than debit column…

           Entry     Debit       Credit

101 abc 1000 200
101 abc 1000 500
101 abc 1000 300

And i want in this format

           Entry     Debit       Credit

101 abc 1000 200
101 500
101 300

By Arvin via Discussions on Python.org at 26May2022 04:34:

Yes it is in Dataframe…
Basically it the ledger entries dataset and after doing some brainstorming i managed to get debit and credit entries in one row with that one problem arise and im not able to resolve that…

Can you show the code which did this for you?

Also, can you show the original dataset?

The problem is that in debit column duplicate value are coming and its because of credit column value is are more than debit column…

How do you know that?

          Entry     Debit       Credit

101 abc 1000 200
101 abc 1000 500
101 abc 1000 300

And i want in this format

          Entry     Debit       Credit

101 abc 1000 200
101 500
101 300

My DataFrame abilities are weak, but in theory you could write a loop
which iterated over the frame rows, with a reference to the previous row
to hand. If the entry in the current row matches the previous row, you
could scrub the current row’s value. Something a bit like this:

prev_row = None
for row in df:
    if prev_row is not None and prev_row['Entry'] == row['Entry']:
        row['Entry'] = ''
    prev_row = row

Note: totally untested.

Cheers,
Cameron Simpson cs@cskk.id.au


This is original dataset


This encircle values are repeating which i dont want

As OPs have asked (3 times I think): what code have you got so far?

Firstly i create new column “uniquevalue”
df[‘uniquevalue’] = df[‘BranchID’].map(str) + df[‘VoucherType’].map(str) + df[‘JournalDate’].map(str) + df[‘VoucherNo’].map(str)

then create new dataframe
df_new = df.filter([‘uniquevalue’, ‘VoucherLedger’, ‘DebitAmount’, ‘CreditAmount’])

debit = df_new[df_new[“DebitAmount”] > 0]
debit.rename(columns = {‘VoucherLedger’:‘DebitVoucher’}, inplace = True)
del debit[‘CreditAmount’]

credit = df_new[df_new[“CreditAmount”] > 0]
credit. rename(columns = {‘VoucherLedger’:‘CreditVoucher’}, inplace = True)
del credit[‘DebitAmount’]

dr_cr = pd.DataFrame.merge(debit, credit, on = ‘uniquevalue’)

You do not need to delete any values, Arvin. In fact you should not delete any values because these are financial data records. Even if you’re only trying to produce a report, that report will probably have errors if you modify the data or the output values just to make it “look right”. Find the root cause instead of masking the actual bug by patching the output.

The specific problem is that the CreditAmount value doesn’t match the DebitAmount value in the rows after the first ‘Cash’ transaction.

It looks like something is making your CreditAmount pointer stop indexing. However, your dataset is so big that your output only shows the Head and Tail. NEXT STEP: Find the point where the CreditAmount value begins repeating.

THEN: Since your dataset has 91531 entries, be a friend to yourself and work with a smaller dataset. You should be able to test with a dataset having only two or three entries of each CreditVoucher type. See if you can make the bug happen with a much smaller dataset. Be sure to include the CreditVoucher types right before and after the point where the repeating begins.

Yes, ur are right my dataset is too big it has around 250000 rows. This is just one csv file data.

I checked everything and it is perfectly fine. The problem arises when the amount gets split.
For eg:
this is the original data
123

this is when i merge dataframe
1234

so I don’t want 5000 to getrepeat.

I hope I am able to explain you my problem.

The [repeating value] problem arises when the amount gets split.

This is a much better definition of the problem, Arvin! :+1:

Pandas is a Python package, but your question here is probably about Pandas instead of Python.

The first question to answer is…
“Can Pandas do accounting transaction splits? Or do I need custom Python code for these splits?”

You will probably get better answers in a Pandas forum.

Post your question there as “Accounting Transaction Split”. This will attract the attention of people with accounting knowledge who will understand your application.

Your bug is “When I split a transaction, the source value (the Total value for the transaction) is repeated for each split. I need the source value to be zero on each split transaction line.” Look up each word in this quote and make sure you fully understand every word.

Make a simple dataset (with only the necessary columns) and post no more than 6 lines of the DATA over to that forum as TEXT characters, not screenshots. You can test a split with two lines. Post no more than three.

Post only the code that performs the split plus any code that allows a reader to understand the setup for the split. If this seems too difficult, then complete a simpler project before continuing with your accounting application.

Best luck to you!
-Leland