Is possible to put like condition filter using json

Hello team,

I want to understand, is it possible to filter below json data using python.

{"logs": [{"filename": "ABC_PARIS_FILE_01.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_PARIS_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_03.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"}]}

I am trying to filter filename using like condition filename like ‘ABC_PARIS%’ to display only paris files,
can you please share idea how to do it using python.

Thanks,
Hemanth.

I suggest you take a look at the json and re modules in the standard library. You can parse JSON data using the json module, and you can search for patterns using the re module.

1 Like

You’ve been pointed at the “re” module. But SQL LIKE conditions like
“ABC_PARIS%” are far more usefully checked with the str.startswith()
method. Example:

record["filename"].startswith("ABC_PARIS")

which is a Boolean you can use in an if-statement or the like.

logs = json_data["logs"]
for record in logs:
    if record["filename"].startswith("ABC_PARIS"):
        ... do stuff ...

The “re” module should almost never be your first choice for testing
strings for simple stuff when the string methods themselves have a bunch
of available simple tests.

Regexps have many uses, but they’re cryptic, somewhat expensive, and
error prone particularly when new to them.

Cheers,
Cameron Simpson cs@cskk.id.au

1 Like

True, startswith() is a better tool for this job.

1 Like

Thank you @cameron , i need one more help, after executing below code it worked fine, but i am not able to list all paris and india file in order, it is only one filename from paris and one filename from india in table format, is there any way to convert json to table data.

import json

from tabulate import tabulate

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_PARIS"):

            pfilename = p['filename']

            pstart = p['transmission_start']

            pend = p['transmission_end']

            print(pfilename,pstart,pend)

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_INDIA"):

            plfilename = p['filename']

            plstart = p['transmission_start']

            plend = p['transmission_end']

            print(plfilename,plstart,plend)

table = [[pfilename,pstart,pend],[plfilename,plstart,plend]]

finaltablehtml=(tabulate(table, headers="firstrow", tablefmt="html"))

finaltabletext=(tabulate(table, headers="firstrow", tablefmt="grid"))

print(finaltabletext);

Can anyone help me on above query.

Can anyone help me on above query.

Could you provide:

  • sample data
  • an example of an actual condition you’d like to match, e.g.:
    • … what would match
    • … what would be excluded from the match

Hello Marco,

Below is the datamart.json data.

{"logs": [{"filename": "ABC_PARIS_FILE_01.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_01.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_PARIS_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_03.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"},
{"filename": "ABC_INDIA_FILE_02.gz", "transmission_start": "23:03:2021 15:21", "transmission_end": "23:03:2021 15:22"}]}

Below is the code

import json

from tabulate import tabulate

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_PARIS"):

            pfilename = p['filename']

            pstart = p['transmission_start']

            pend = p['transmission_end']

            print(pfilename,pstart,pend)

with open('datamart.json','r') as f:

    data = json.load(f)

    for p in data['logs']:

        if str(p['filename']).startswith("ABC_INDIA"):

            plfilename = p['filename']

            plstart = p['transmission_start']

            plend = p['transmission_end']

            print(plfilename,plstart,plend)

table = [[pfilename,pstart,pend],[plfilename,plstart,plend]]

finaltablehtml=(tabulate(table, headers="firstrow", tablefmt="html"))

finaltabletext=(tabulate(table, headers="firstrow", tablefmt="grid"))

print(finaltabletext);

I am getting output as below.

+------------------------+--------------------+--------------------+
| ABC_PARIS_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+========================+====================+====================+
| ABC_INDIA_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+

But i am looking for below output.

+========================+====================+====================+
| ABC_PARIS_FILE_01.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+
| ABC_PARIS_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+========================+====================+====================+
| ABC_INDIA_FILE_01.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+
| ABC_INDIA_FILE_02.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+========================+====================+====================+
| ABC_INDIA_FILE_03.gz   | 23:03:2021 15:21   | 23:03:2021 15:22   |
+------------------------+--------------------+--------------------+

Can you please help me in getting output like above.

Thanks,
Hemanth.

Hi. As a minimalistic example, this:

import json
from tabulate import tabulate
from pathlib import Path
data = json.loads(Path("datamart.json").read_text())
keys = ["filename", "transmission_start", "transmission_end"]
table = [tuple(entry[x] for x in keys)
         for entry in data["logs"]]
print(tabulate(sorted(table), headers=keys, tablefmt="grid"))

outputs:

+----------------------+----------------------+--------------------+
| filename             | transmission_start   | transmission_end   |
+======================+======================+====================+
| ABC_INDIA_FILE_01.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_INDIA_FILE_02.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_INDIA_FILE_03.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_PARIS_FILE_01.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+
| ABC_PARIS_FILE_02.gz | 23:03:2021 15:21     | 23:03:2021 15:22   |
+----------------------+----------------------+--------------------+

Does it help?

You can customise your sorting as you please.

1 Like

Yes it worked thank you very much