Total no. of rows in a csv

shomikc · June 17, 2023, 10:58am

Hi. I need help to understand this question.

“Find the number of distinct bookings from the given dataset in bookings.csv”. Does this have to be done with pandas? Please help. Thanks.

kknechtel · June 17, 2023, 11:25am

Did the instructor tell you to use Pandas? (How did you know that there is such a thing in the first place?) Does it say anything else in the assignment at all?

shomikc · June 17, 2023, 12:34pm

No sir. I dont know. This was an assignment in the final course project. I missed the previous 3 modules which had pandas, sql and matplotlib. So my grade has gone down and to improve my grade I have to the project first. so I was just guessing. Could you please tell me how I should proceed? Thankyou. ( i am doing a course in Python from an institute called Learnbay.)

kknechtel · June 17, 2023, 10:43pm

Well, how can anyone on the Internet help with that? We aren’t taking your course, so we don’t know what you were supposed to learn in those modules.

cameron · June 18, 2023, 3:01am

You don’t need pandas for this. Pandas does have a very convenient
read_csv method which reads a CSV file and returns a DataFrame:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html#pandas.read_csv

You’ll see it has very many parameters, but aside from the CSV filename
itself they are nearly all optional - supply only those you need to make
the load work correctly.

The nice thing about a DataFrame is that it also has many methods,
which might make finding distinct bookings easier. On the other hand, it
has many methods, and finding what you want may take a while.

The other way is with the presupplied csv Python module:

You get a csv.reader instance and read all the rows from it.

The important of the word “distinct” is that you only want to rows as
different according to some criteria. For example, if the CSV is just an
accumulation of records, including revisions to older records, then
you’re in a sense only interested in the last row for each booking. The
bookings will be identified by a particular column or combination of
columns. Those columns comprise the key whose various values you want to
count.

Typically you would do this with a dict, storing each row in the
dictionary according to the key, perhaps keeping only the last.

But if all you need to do is to count the distinct keys, use a set,
and add just the key to the set. Then measure the length of the set when
finished.

Cheers,
Cameron Simpson cs@cskk.id.au

shomikc · June 18, 2023, 8:45am

Thankyou for your help.

shomikc · June 18, 2023, 10:14am

Hi, Mr. Cameron Sir,

I have done it this way.

import pandas as pd
data = pd.read_csv('C:\Bookings.csv')
#data.info()
a = data['booking_id'].nunique()
print('Unique Booking Id: ',a)

data1 = pd.read_csv('C:\Sessions.csv')
b = data1['session_id'].nunique()
c = data1['search_id'].nunique()
print('Unique Session Id: ',b,'    Unique Search Id:  ',c)

Can b and c be found using only one line of code?

Please help. Thankyou.

cameron · June 18, 2023, 10:48am

I have done it this way.

import pandas as pd
data = pd.read_csv('C:\Bookings.csv')
#data.info()
a = data['booking_id'].nunique()
print('Unique Booking Id: ',a)

Ah, the `DataFrame and its many methods.

data1 = pd.read_csv('C:\Sessions.csv')
b = data1['session_id'].nunique()
c = data1['search_id'].nunique()
print('Unique Session Id: ',b,'    Unique Search Id:  ',c)

Can b and c be found using only one line of code?

Well, there’s the trite:

 b = data1['session_id'].nunique(); c = data1['search_id'].nunique()

or:

 b, c = data1['session_id'].nunique(), data1['search_id'].nunique()

but they’re really no better. You’re only computing 2 values, there’s no
need for anything clever.

We almost never use the “statement; statement” syntax in Python, BTW.

Some random remarks:
data and data1 are not great names, consider bookings and
sessions as easier to remember.

The same with a, b, c: we’re often happier with wordy but
meaningful names like n_unique_bookings or things like that.

Because the backslash has special meaning in strings, for Windows paths
we often use a “raw string” like this:

 r'C:\Bookings.csv'

because in a raw string the backslash is not special. For your two
filenames you’re ok, but if you’d made a new.csv then this:

 'C:\new.csv'

actually contains a newline character, not the two characters \ and
n. Whereas:

 r'C:\new.csv'

does what you want.

You can also use UNIX style forward slashes in Windows paths:

 'C:/new.csv'

which sidesteps the backslash issue.

Cheers,
Cameron Simpson cs@cskk.id.au

shomikc · June 18, 2023, 11:57am

Thankyou so much.

shomikc · June 27, 2023, 9:43am

Hello Cameron Sir,

I have a problem in pandas (merging two columns(from_city and to_city) to find the travel route) and then to find customers who have travelled more than once (using groupby I think) and to find the maximum frequency of each route (using mode() I think. Can I mail you the dataset and question at cs@cskk.id.au. if it is okay with you? Thanks sir.

Topic		Replies	Views
How to count rows without using pandas in python? Python Help documentation , help , support	4	4873	November 15, 2021
Python Learning Tips - Laser Focused on CSV Data Processing Python Help	5	509	October 25, 2023
Counting Duplicates and Getting Sample Values Python Help help	1	412	July 3, 2022
Merge some rows in a pandas dataframe Python Help	2	715	September 29, 2023
Read visitor data from a csv file Python Help	4	579	November 26, 2023

Total no. of rows in a csv

Related Topics