Help: json.loads() cannot parse valid json

python code:

import json
x = json.loads('{"message":"variable `z` is assigned to, but never used","code":{"code":"unused_variables","explanation":null},"level":"warning","spans":[{"file_name":"main.rs","byte_start":191,"byte_end":192,"line_start":8,"line_end":8,"column_start":9,"column_end":10,"is_primary":true,"text":[{"text":"    let z = \"this is a relatively long string, to see the diff between strings and code.\";","highlight_start":9,"highlight_end":10}],"label":null,"suggested_replacement":null,"suggestion_applicability":null,"expansion":null}],"children":[{"message":"`#[warn(unused_variables)]` on by default","code":null,"level":"note","spans":[],"children":[],"rendered":null},{"message":"consider using `_z` instead","code":null,"level":"note","spans":[],"children":[],"rendered":null}],"rendered":"warning: variable `z` is assigned to, but never used\n --> main.rs:8:9\n  |\n8 |     let z = \"this is a relatively long string, to see the diff between strings and code.\";\n  |         ^\n  |\n  = note: `#[warn(unused_variables)]` on by default\n  = note: consider using `_z` instead\n\n"}')
print(x)

gives the error message:

  File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.7/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 303 (char 302)

however, the json that i used is valid


so i have no idea what could have happened. thanks!

It looks like the problem is with this particular part of the JSON string:

{"text":"    let z = \"this is a relatively long string, to see the diff between strings and code.\";"

and isolating it does indeed produce the error you’re getting:

import json
x = json.loads('{"text":"    let z = \"this is a relatively long string, to see the diff between strings and code.\";"}')
print(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jmorris/.pyenv/versions/3.10.0/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/Users/jmorris/.pyenv/versions/3.10.0/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/jmorris/.pyenv/versions/3.10.0/lib/python3.10/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 23 (char 22)

My guess is that the problem lies with the two escaped quotes \" in the string as they do not stay escaped when interpreted by Python. Using a raw string (r'') mitigates this:

import json
x = json.loads(r'{"text":"    let z = \"this is a relatively long string, to see the diff between strings and code.\";"}')
print(x)

The string is not valid JSON because of the escape sequences.

You have at least four cases of the escape sequence \" (backslash
quote) in your string, but in the Python interpreter, they are escape
sequences and are interpreted as a plain old quote:

>>> print("abcd\"efgh")
abcd"efgh

You need to either escape the backslash so that they remain in the
string and are visible to the JSON encoder, or use a raw string.

Tested and works:

You can fix the escape sequence problem by using a raw string. At the
beginning of your JSON loads command, change this:

x = json.loads('{"message": ... # blah blah blah blah

to this:

x = json.loads(r'{"message": ... # blah blah blah blah

The r prefix tells the interpreter to treat the backslashes as ordinary
characters, not escape sequences. So they are inserted into the string
and passed to the JSON encoder.

A second possible solution, which I have not tested, is to escape the
backslashes by using two backslashes in a row: \\, but if you do that
you might find that the bare quote now matches another quote, so you
need three(!) backslashes: \\\" instead. So try either doubled-
backslashes, or tripled-backslashes, and see which works.

Or just use the raw string r'...' syntax instead.

ooh, thanks, this worked!

If my json is in a variable, how do I specify this?

I tried json.loads(msg_json.replace("\\", r"\\")) but still get the same error.

My json looks like this (pasting some part):

msg_json = """
{
  "eventTimeUtc": "2022-04-18T17:46:31.0372926Z",
"Data": "[{\"Id\":20737,\"DateCreated\":\"\\/Date(1650303990870-0400)\\/\",\"SessionId\":\"cae570cc-1b93-4de6-b7fc-3692a9bb74cf\",\"ActivityId\":1,\"UserId\":12566,\"UserName\":\"J CA, Jose\",\"UploadedForUser\":null,\"CompanyId\":13077,\"CompanyName\":\"fm\",\"DocumentId\":15340,\"DocumentTitle\":\"doc - Copy (40) - Copy\",\"DocumentType\":null,\"DocumentTypeId\":1,\"DocumentIndex\":\"13\",\"DocumentStatus\":\"Active\",\"IsCleanDocument\":false,\"Version\":null,\"CheckListId\":0,\"CheckListTitle\":\"Testing\",\"CheckListItemId\":6061,\"CheckListItemTitle\":\"Temp\",\"CheckListItemStatus\":\"\",\"CheckListItemSchema\":\"V.C.\",\"Value\":null,\"FullDocumentTitle\":\"doc - Copy (40) - Copy\",\"DocumentDeleted\":false,\"DocumentFilename\":\"doc - Copy (40) - Copy.txt\",\"CheckListSchemaOptionTypeId\":1,\"CheckListActive\":true,\"CheckListItemDeleted\":false,\"FullFolderPath\":\"\\\\Testing\\\\Temp\",\"UserSupport\":false,\"ContentStatus\":null,\"DocumentVersion\":1,\"Xml\":null,\"JobId\":\"6dc01e99-9990-4841-b3aa-f44059956a0f\",\"SecondaryId\":\"c2664d1f-b775-47b6-b44f-76cf61779db8\",\"GroupId\":3,\"GroupName\":\"Administrators\",\"AppId\":\"xw2IM8jr\",\"MessageId\":\"466e34d1-54ee-46a2-a40d-52a46c10e3d1\"},{\"Id\":20738,\"DateCreated\":\"\\/Date(1650303990870-0400)\\/\",\"SessionId\":\"cae570cc-1b93-4de6-b7fc-3692a9bb74cf\",\"ActivityId\":1,\"UserId\":12566,\"UserName\":\"J CA, Jose\",\"UploadedForUser\":null,\"CompanyId\":13077,\"CompanyName\":\"fm\",\"DocumentId\":15341,\"DocumentTitle\":\"doc - Copy (41) - Copy\",\"DocumentType\":null,\"DocumentTypeId\":1,\"DocumentIndex\":\"14\",\"DocumentStatus\":\"Active\",\"IsCleanDocument\":false,\"Version\":null,\"CheckListId\":0,\"CheckListTitle\":\"Testing\",\"CheckListItemId\":6061,\"CheckListItemTitle\":\"Temp\",\"CheckListItemStatus\":\"\",\"CheckListItemSchema\":\"V.C.\",\"Value\":null,\"FullDocumentTitle\":\"doc - Copy (41) - Copy\",\"DocumentDeleted\":false,\"DocumentFilename\":\"doc - Copy (41) - Copy.txt\",\"CheckListSchemaOptionTypeId\":1,\"CheckListActive\":true,\"CheckListItemDeleted\":false,\"FullFolderPath\":\"\\\\Testing\\\\Temp\",\"UserSupport\":false,\"ContentStatus\":null,\"DocumentVersion\":1,\"Xml\":null,\"JobId\":\"6dc01e99-9990-4841-b3aa-f44059956a0f\",\"SecondaryId\":\"1467dbbd-7dcc-4d56-accf-8078ead8f23a\",\"GroupId\":3,\"GroupName\":\"Administrators\",\"AppId\":\"xw2IM8jr\",\"MessageId\":\"bac24110-ea8d-4b9f-8b21-71ff14899d14\"}]",
"DataType": "DocumentActivityHistory",
}
"""

Putting the r before the “”" solved the problem:

msg_json = r"""
{
  "eventTimeUtc": "2022-04-18T17:46:31.0372926Z",
"Data": "[{\"Id\":20737,\"DateCreated\":\"\\/Date(1650303990870-0400)\\/\",\"SessionId\":\"cae570cc-1b93-4de6-b7fc-3692a9bb74cf\",\"ActivityId\":1,\"UserId\":12566,\"UserName\":\"J CA, Jose\",\"UploadedForUser\":null,\"CompanyId\":13077,\"CompanyName\":\"fm\",\"DocumentId\":15340,\"DocumentTitle\":\"doc - Copy (40) - Copy\",\"DocumentType\":null,\"DocumentTypeId\":1,\"DocumentIndex\":\"13\",\"DocumentStatus\":\"Active\",\"IsCleanDocument\":false,\"Version\":null,\"CheckListId\":0,\"CheckListTitle\":\"Testing\",\"CheckListItemId\":6061,\"CheckListItemTitle\":\"Temp\",\"CheckListItemStatus\":\"\",\"CheckListItemSchema\":\"V.C.\",\"Value\":null,\"FullDocumentTitle\":\"doc - Copy (40) - Copy\",\"DocumentDeleted\":false,\"DocumentFilename\":\"doc - Copy (40) - Copy.txt\",\"CheckListSchemaOptionTypeId\":1,\"CheckListActive\":true,\"CheckListItemDeleted\":false,\"FullFolderPath\":\"\\\\Testing\\\\Temp\",\"UserSupport\":false,\"ContentStatus\":null,\"DocumentVersion\":1,\"Xml\":null,\"JobId\":\"6dc01e99-9990-4841-b3aa-f44059956a0f\",\"SecondaryId\":\"c2664d1f-b775-47b6-b44f-76cf61779db8\",\"GroupId\":3,\"GroupName\":\"Administrators\",\"AppId\":\"xw2IM8jr\",\"MessageId\":\"466e34d1-54ee-46a2-a40d-52a46c10e3d1\"},{\"Id\":20738,\"DateCreated\":\"\\/Date(1650303990870-0400)\\/\",\"SessionId\":\"cae570cc-1b93-4de6-b7fc-3692a9bb74cf\",\"ActivityId\":1,\"UserId\":12566,\"UserName\":\"J CA, Jose\",\"UploadedForUser\":null,\"CompanyId\":13077,\"CompanyName\":\"fm\",\"DocumentId\":15341,\"DocumentTitle\":\"doc - Copy (41) - Copy\",\"DocumentType\":null,\"DocumentTypeId\":1,\"DocumentIndex\":\"14\",\"DocumentStatus\":\"Active\",\"IsCleanDocument\":false,\"Version\":null,\"CheckListId\":0,\"CheckListTitle\":\"Testing\",\"CheckListItemId\":6061,\"CheckListItemTitle\":\"Temp\",\"CheckListItemStatus\":\"\",\"CheckListItemSchema\":\"V.C.\",\"Value\":null,\"FullDocumentTitle\":\"doc - Copy (41) - Copy\",\"DocumentDeleted\":false,\"DocumentFilename\":\"doc - Copy (41) - Copy.txt\",\"CheckListSchemaOptionTypeId\":1,\"CheckListActive\":true,\"CheckListItemDeleted\":false,\"FullFolderPath\":\"\\\\Testing\\\\Temp\",\"UserSupport\":false,\"ContentStatus\":null,\"DocumentVersion\":1,\"Xml\":null,\"JobId\":\"6dc01e99-9990-4841-b3aa-f44059956a0f\",\"SecondaryId\":\"1467dbbd-7dcc-4d56-accf-8078ead8f23a\",\"GroupId\":3,\"GroupName\":\"Administrators\",\"AppId\":\"xw2IM8jr\",\"MessageId\":\"bac24110-ea8d-4b9f-8b21-71ff14899d14\"}]",
"DataType": "DocumentActivityHistory",
}
"""

But what if the dict is in a variable, coming in from a Queue:

msg_json = event['Records']
for d in msg_json:
    body = json.loads(d['body'])

How do convert it to a raw string in this case?

You don’t. A raw string isn’t a different kind of object, it’s a way of creating a string from text in the code. As long as the queue has the correct string, there’s no problem.

Where is the string that is getting place into the queue coming from?

It is a dict being generated by another system and placed in the queue.

{
  "eventTimeUtc": "2022-04-18T17:46:31.0372926Z",
  "appId": "xw2IM8jr",
  "sessionId": "cae570cc-1b93-4de6-b7fc-3692a9bb74cf",
  "correlationId": "47b6e817-4f09-4578-b2c5-a126624dae9a",
  "transactionId": "6ccbe933-4c4f-4f56-89ff-f51f070aa434",
  "ipAddress": null,
  "isSystemEvent": false,
  "instanceId": 1,
  "projectId": 8420,
  "userProfileId": 15536,
  "eventType": "fx.Bus.Client.Messages.Background.BackgroundMessage",
  "metadata": {
    "sequenceOrder": "After",
    "dataType": "DocumentActivityHistory",
    "dataTypeMethod": "AddActivityHistory",
    "instance": {
      "id": 1,
      "clientName": "Dev",
      "domain": "http://localhost",
      "instanceUser": {
        "id": 12566,
        "companyId": 13077,
        "companyName": "fx"
      }
    },
    "project": {
      "id": 8420,
      "name": null,
      "groupMembership": {
        "id": 3,
        "name": "Administrators"
      }
    },
    "userProfile": {
      "id": 15536,
      "firstName": "John",
      "lastName": "Smith CA",
      "email": "John.Smith@fx.ca"
    },
    "region": "Canada"
  },
  "webRequest": null,
  "eventData": {
    "BusinessObject": "Background",
    "Data": "[{\"Id\":20737,\"DateCreated\":\"\\/Date(1650303990870-0400)\\/\",\"SessionId\":\"cae570cc-1b93-4de6-b7fc-3692a9bb74cf\",\"ActivityId\":1,\"UserId\":12566,\"UserName\":\"Smith CA, John\",\"UploadedForUser\":null,\"CompanyId\":13077,\"CompanyName\":\"fx\",\"DocumentId\":15340,\"DocumentTitle\":\"doc - Copy (40) - Copy\",\"DocumentType\":null,\"DocumentTypeId\":1,\"DocumentIndex\":\"13\",\"DocumentStatus\":\"Active\",\"IsCleanDocument\":false,\"Version\":null,\"CheckListId\":0,\"CheckListTitle\":\"Testing\",\"CheckListItemId\":6061,\"CheckListItemTitle\":\"Temp\",\"CheckListItemStatus\":\"\",\"CheckListItemSchema\":\"V.C.\",\"Value\":null,\"FullDocumentTitle\":\"doc - Copy (40) - Copy\",\"DocumentDeleted\":false,\"DocumentFilename\":\"doc - Copy (40) - Copy.txt\",\"CheckListSchemaOptionTypeId\":1,\"CheckListActive\":true,\"CheckListItemDeleted\":false,\"FullFolderPath\":\"\\\\Testing\\\\Temp\",\"UserSupport\":false,\"ContentStatus\":null,\"DocumentVersion\":1,\"Xml\":null,\"JobId\":\"6dc01e99-9990-4841-b3aa-f44059956a0f\",\"SecondaryId\":\"c2664d1f-b775-47b6-b44f-76cf61779db8\",\"GroupId\":3,\"GroupName\":\"Administrators\",\"AppId\":\"xw2IM8jr\",\"MessageId\":\"466e34d1-54ee-46a2-a40d-52a46c10e3d1\"},{\"Id\":20738,\"DateCreated\":\"\\/Date(1650303990870-0400)\\/\",\"SessionId\":\"cae570cc-1b93-4de6-b7fc-3692a9bb74cf\",\"ActivityId\":1,\"UserId\":12566,\"UserName\":\"Smith CA, John\",\"UploadedForUser\":null,\"CompanyId\":13077,\"CompanyName\":\"fx\",\"DocumentId\":15341,\"DocumentTitle\":\"doc - Copy (41) - Copy\",\"DocumentType\":null,\"DocumentTypeId\":1,\"DocumentIndex\":\"14\",\"DocumentStatus\":\"Active\",\"IsCleanDocument\":false,\"Version\":null,\"CheckListId\":0,\"CheckListTitle\":\"Testing\",\"CheckListItemId\":6061,\"CheckListItemTitle\":\"Temp\",\"CheckListItemStatus\":\"\",\"CheckListItemSchema\":\"V.C.\",\"Value\":null,\"FullDocumentTitle\":\"doc - Copy (41) - Copy\",\"DocumentDeleted\":false,\"DocumentFilename\":\"doc - Copy (41) - Copy.txt\",\"CheckListSchemaOptionTypeId\":1,\"CheckListActive\":true,\"CheckListItemDeleted\":false,\"FullFolderPath\":\"\\\\Testing\\\\Temp\",\"UserSupport\":false,\"ContentStatus\":null,\"DocumentVersion\":1,\"Xml\":null,\"JobId\":\"6dc01e99-9990-4841-b3aa-f44059956a0f\",\"SecondaryId\":\"1467dbbd-7dcc-4d56-accf-8078ead8f23a\",\"GroupId\":3,\"GroupName\":\"Administrators\",\"AppId\":\"xw2IM8jr\",\"MessageId\":\"bac24110-ea8d-4b9f-8b21-71ff14899d14\"}]",
    "DataType": "DocumentActivityHistory",
    "Action": "",
    "DataTypeMethod": 0,
    "Method": "AddActivityHistory",
    "Result": {
      "Status": null,
      "OutputData": null,
      "Errors": []
    },
    "Name": "DocumentActivityHistory",
    "InstanceId": 1,
    "DealRoomId": 8420,
    "User": {
      "SiteUserId": 12566,
      "SessionId": "cae570cc-1b93-4de6-b7fc-3692a9bb74cf",
      "AppId": "xw2IM8jr",
      "UserProfileId": 15536
    },
    "IncludeDeletedProjects": false,
    "TransactionId": "6ccbe933-4c4f-4f56-89ff-f51f070aa434",
    "ParentTransactionId": null,
    "DateCreated": "2022-04-18T17:46:31.02+00:00",
    "SendReply": true,
    "CorrelationId": "47b6e817-4f09-4578-b2c5-a126624dae9a",
    "Attachment": null,
    "SagaMessagePayloadType": null,
    "SagaMessagePayload": null,
    "SagaReply": null,
    "DataAccess": 2
  }
}

The array called Data has a lot of back slashes that cause problems.

If it’s already a dict, then there’s no JSON to load at all. Instead of loading, were you asking about converting it to a json string (.dumps())?

I am using json.loads() to convert it to a Python dict, and the escape characters make the json string invalid.

json.loads works on a string. Where does the string come from? What is creating it? Is it read from a file (what creates the file?). Is it read from a web request? Whatever creates the string is responsible for making sure it’s valid json. Passing it around later doesn’t matter.

It comes from an SQS queue. I don’t have control over its format.

As pasted above, I need some tips on how to handle it if it’s in a variable.

Same question here. How to load and format response properly from a variable

Can you give more detail? As long as you have valid JSON, can load it into an object with json.load() or loads().

@BowlOfRed server response is

{
    "overriding_parameters": {
        "jar_params": [
            "{\"aggregationType\":\"Type1\",\"startDate\":\"2022-05-10\",\"endDate\":\"2022-05-10\"}"
        ]
    }
}

If I do the following:

import json
dummy_response = r'{"overriding_parameters": {"jar_params": ["{\"aggregationType\":\"Type1\",\"startDate\":\"2022-05-10\",\"endDate\":\"2022-05-10\"}"]}}'
dummy_dict = json.loads(dummy_response)

it works fine.

But, this server response is coming from airflow xcom as a string:

response = kwargs['ti'].xcom_pull(task_ids='get_run_list')
type(response) # str
metrics = json.loads(response)

How to correctly substitute r in this case? type(response) is str.

I tried this one

json.loads(r"{}".format(response))

But it doesn’t work.

How to handle it properly if the response is a variable, not raw str?

Not valid JSON. That’s not the right way to send data. Is it supposed to be a JSON response?

You might be able to decode it, but it’s not JSON. This sort of works for that particular code, not sure it would for any other.

import ast

s = "{\"aggregationType\":\"Type1\",\"startDate\":\"2022-05-10\",\"endDate\":\"2022-05-10\"}"
s2 = ast.literal_eval(s)
print(s2)
{'aggregationType': 'Type1', 'startDate': '2022-05-10', 'endDate': '2022-05-10'}

https://jsonlint.com/ would not agree with you regarding it’s not a valid JSON. This is valid JSON.

Yeah, I’ve tried ast.literal_eval(s) approach, but the issue is how to reach a proper part of the whole string to run evaluation. As you just cut string to a particular part, instead of a full string '{"overriding_parameters": {"jar_params": ["{\"aggregationType\":\"Type1\",\"startDate\":\"2022-05-10\",\"endDate\":\"2022-05-10\"}"]}}' so the question is how to handle the full one

Never mind. Will make another post here in a bit.

You shouldn’t need to. The r"" expression is only for allowing you type in literals in your program. A real str is already properly set up.

If json.loads(r'...') works, but json.loads(s) doesn’t, then there’s some difference in the string.

Can you show the output of both print(response) and print(repr(respsonse))?