Is possible to put like condition filter using json

hemanth22 · March 23, 2021, 12:37pm

Hello team,

I want to understand, is it possible to filter below json data using python.

{"logs": [{"filename": "ABC_PARIS_FILE_01.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_PARIS_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_03.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"}]}

I am trying to filter filename using like condition filename like ‘ABC_PARIS%’ to display only paris files,
can you please share idea how to do it using python.

Thanks,
Hemanth.

erlendaasland · March 23, 2021, 12:43pm

I suggest you take a look at the json and re modules in the standard library. You can parse JSON data using the json module, and you can search for patterns using the re module.

cameron · March 23, 2021, 10:27pm

You’ve been pointed at the “re” module. But SQL LIKE conditions like
“ABC_PARIS%” are far more usefully checked with the str.startswith()
method. Example:

record["filename"].startswith("ABC_PARIS")

which is a Boolean you can use in an if-statement or the like.

logs = json_data["logs"]
for record in logs:
    if record["filename"].startswith("ABC_PARIS"):
        ... do stuff ...

The “re” module should almost never be your first choice for testing
strings for simple stuff when the string methods themselves have a bunch
of available simple tests.

Regexps have many uses, but they’re cryptic, somewhat expensive, and
error prone particularly when new to them.

Cheers,
Cameron Simpson cs@cskk.id.au

erlendaasland · March 23, 2021, 10:36pm

True, startswith() is a better tool for this job.

hemanth22 · March 30, 2021, 7:17am

Thank you @cameron , i need one more help, after executing below code it worked fine, but i am not able to list all paris and india file in order, it is only one filename from paris and one filename from india in table format, is there any way to convert json to table data.

import json

from tabulate import tabulate

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_PARIS"):

            pfilename = p['filename']

            pstart = p['transmission_start']

            pend = p['transmission_end']

            print(pfilename,pstart,pend)

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_INDIA"):

            plfilename = p['filename']

            plstart = p['transmission_start']

            plend = p['transmission_end']

            print(plfilename,plstart,plend)

table = [[pfilename,pstart,pend],[plfilename,plstart,plend]]

finaltablehtml=(tabulate(table, headers="firstrow", tablefmt="html"))

finaltabletext=(tabulate(table, headers="firstrow", tablefmt="grid"))

print(finaltabletext);

hemanth22 · March 31, 2021, 6:50am

Can anyone help me on above query.

Maroloccio · March 31, 2021, 8:02am

Can anyone help me on above query.

Could you provide:

sample data
an example of an actual condition you’d like to match, e.g.:
- … what would match
- … what would be excluded from the match

hemanth22 · April 1, 2021, 6:32am

Hello Marco,

Below is the datamart.json data.

{"logs": [{"filename": "ABC_PARIS_FILE_01.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_01.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_PARIS_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_03.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"}]}

Below is the code

import json

from tabulate import tabulate

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_PARIS"):

            pfilename = p['filename']

            pstart = p['transmission_start']

            pend = p['transmission_end']

            print(pfilename,pstart,pend)

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_INDIA"):

            plfilename = p['filename']

            plstart = p['transmission_start']

            plend = p['transmission_end']

            print(plfilename,plstart,plend)

table = [[pfilename,pstart,pend],[plfilename,plstart,plend]]

finaltablehtml=(tabulate(table, headers="firstrow", tablefmt="html"))

finaltabletext=(tabulate(table, headers="firstrow", tablefmt="grid"))

print(finaltabletext);

I am getting output as below.

+------------------------+--------------------+--------------------+
| ABC_PARIS_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+========================+====================+====================+
| ABC_INDIA_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+

But i am looking for below output.

+========================+====================+====================+
| ABC_PARIS_FILE_01.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+
| ABC_PARIS_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+========================+====================+====================+
| ABC_INDIA_FILE_01.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+
| ABC_INDIA_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+========================+====================+====================+
| ABC_INDIA_FILE_03.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+

Can you please help me in getting output like above.

Thanks,
Hemanth.

Maroloccio · April 1, 2021, 7:20am

Hi. As a minimalistic example, this:

import json
from tabulate import tabulate
from pathlib import Path
data = json.loads(Path("datamart.json").read_text())
keys = ["filename", "transmission_start", "transmission_end"]
table = [tuple(entry[x] for x in keys)
         for entry in data["logs"]]
print(tabulate(sorted(table), headers=keys, tablefmt="grid"))

outputs:

+----------------------+----------------------+--------------------+
| filename             | transmission_start   | transmission_end   |
+======================+======================+====================+
| ABC_INDIA_FILE_01.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_INDIA_FILE_02.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_INDIA_FILE_03.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_PARIS_FILE_01.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_PARIS_FILE_02.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+

Does it help?

You can customise your sorting as you please.

hemanth22 · April 1, 2021, 7:26am

Yes it worked thank you very much