Help: json.loads() cannot parse valid json

Yeah, here is a real example I’m struggling with. It’s absolutely the same string :slight_smile:

import json
dummy_response = '{"overriding_parameters": {"jar_params": ["{\"aggregationType\":\"Type1\",\"startDate\":\"2022-05-10\",\"endDate\":\"2022-05-10\"}"]}}'
json.loads(r'{"overriding_parameters": {"jar_params": ["{\"aggregationType\":\"Type1\",\"startDate\":\"2022-05-10\",\"endDate\":\"2022-05-10\"}"]}}') # works fine - test it out
json.loads(dummy_response) # raises JSONDecodeError
print(dummy_response) # {"overriding_parameters": {"jar_params": ["{"aggregationType":"Type1","startDate":"2022-05-10","endDate":"2022-05-10"}"]}}
print(repr(dummy_response)) # '{"overriding_parameters": {"jar_params": ["{"aggregationType":"Type1","startDate":"2022-05-10","endDate":"2022-05-10"}"]}}'

Prints have different in the first single quote in repr.

Like I said, dummy_response (the contents of the variable after assignment) is not real JSON because you’re losing the backslashes. But you’re not really typing it in, I hope. Is that what the real response gives you?

In other words, the text is JSON, but the contents of the variable is not. r"" only applies to typed in data, not to data received from a function. What’s the real data received from the function? Presumably the backslashes are already lost by that point?

No, this is a real response from the server. This response is pushed to airflow's xcom and then I read it. The response code is 200, and the response text itself is what I’ve shared.

Can you run this:

response = kwargs['ti'].xcom_pull(task_ids='get_run_list')
print(repr(response))
json.loads(response) 

(With a real request, not with a string literal in the code.)

Does it work? What does it print?

But we haven’t seen that. It is significant exactly how the text is encoded in the response. Can you actually show the repr() of the response rather than from something typed in?

I am having the same issue you had a year ago, but I don’t see any solution from this email chain. How did you resolve this issue? Thanks!

The examples from @lamtodor seemed to have Python string encoding problems, in that the failing examples were (to my eye) incorrectly written in their Python code. We never saw the raw server responses which were giving the error. So we don’t actually know what came from his server.

If you’re encountering a JSON decoding problem with a server response, please show us the raw server response, and the Python code you’re using to parse it.

A quote in the product description is very popular, a backslash is added to escape it when the data is sent to a client. For example the JSON string below is returned from a server with “"” in the description. The test code below returns errors.

import json

d = '''
{
"icepeserlotinitsingle": {
    "whse": "0041",
    "prod": "865117669",
    "proddesc": "SOLTIS86 TRUE BLACK 69.7\" 11.2 OZ 54 YD RL",
    "proof": 15,
    "unit": "YD",
    "userfield": ""
  }
}
'''
dic = json.loads(d)
print(type(dic))
========================= RESTART: C:/Python/tststr.py =========================
Traceback (most recent call last):
  File "C:/Python/tststr.py", line 16, in <module>
    dic = json.loads(d)
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\json\__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Users\AppData\Local\Programs\Python\Python310\lib\json\decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 6 column 44 (char 118)

If I remove “"” from product description - “SOLTIS86 TRUE BLACK 69.7 11.2 OZ 54 YD RL” the code runs fine, not sure how to resolve this json.loads() issue.

Thanks for your help!

The problem here is that the \" inside a string literal simply means a literal quote character. If you were reading this from a file, rather than embedding it directly in your source code, it would work fine; alternatively, use a raw literal to change backlashes to represent themselves.

import json

d = r'''
{
"icepeserlotinitsingle": {
    "whse": "0041",
    "prod": "865117669",
    "proddesc": "SOLTIS86 TRUE BLACK 69.7\" 11.2 OZ 54 YD RL",
    "proof": 15,
    "unit": "YD",
    "userfield": ""
  }
}
'''
dic = json.loads(d)
print(type(dic))
1 Like

Your solution works great! For both reading in from input data and embedding it in the source code. Thank you!

1 Like

[…]

In case Chris’ remark is unclear, he’s saying that you’d defined the
string d in your example Python like this:

 d = '''
      "key": "value with a quote \" here",
 '''

To check its content, put a print(d) in your code right after you
define d. Then you can see what is actually in d, which isn’t quite
as you’d hoped.

In a Python string definition a backslash embeds the next character in
the string value. The backslash itself is discarded. But in a raw string
this is not done.

To see this, run this code:

 s1 = "string with quote \" here"
 print(s1)
 s2 = r"string with quote \" here"
 print(s2)

Your difficulty comes because JSON also uses this convention. I
imagine that you’ve copy/pasted the raw JSON into the Python code:

 d = '''
     JSON pasted here
 '''

The JSON contains a string containing a double quote. To express that,
the JSON backslashes that quote character.

For the JSON decode (json.loads) to work, that backslash must be in
the JSON string. But Python consumed it while reading your d='''
string.

As Chris says, the easiest way to preserve that is to use a “raw
string”, where the quoting convention doesn’t use a backslash. This is
as simple as using r''' instead of r''' in your assignment.

I’m not sure that is clearer than Chris’ example, but I hope the
examples help.

Final remark: the result of a “raw string” is still just a string. All
you’re doing is choosing a slightly different Python syntax to express
that string, which doesn’t consume backslashes.

Cheers,
Cameron Simpson cs@cskk.id.au

This is a bad example on my part. Because the backslash is not special
to a raw string, s2 ends at the second ".

Better like this:

  s1 = 'string with quote \" here'
  print(s1)
  s2 = r'string with quote \" here'
  print(s2)

Cheers,
Cameron Simpson cs@cskk.id.au

Not quite. The Python parser first uses simple rules to figure out where the string ends, essentially by assuming that any character immediately after a backslash is not the final quote. Then in a later pass, when it actually figures out the string contents, it treats the \" sequence within the string as two separate characters (because now “the backslash is not special”).

>>> r"string with quote \" here"
'string with quote \\" here'

For the same reason, it is not possible to end a raw string literal with an odd number of backslashes:

>>> r"example\"
  File "<stdin>", line 1
    r"example\"
              ^
SyntaxError: EOL while scanning string literal
>>>
>>> # the result contains two backslashes, represented as four
>>> # because the canonical representation uses a normal literal.
>>> r"example\\" 
'example\\\\'
>>>
>>> r"example\\\"
  File "<stdin>", line 1
    r"example\\\"
                ^
SyntaxError: EOL while scanning string literal