Web scraping using Python and Requests UPDATED

UPDATED:

So Im not sure why this question is not getting any love?

Am i not being clear as to what is needed? or is it not possible to do what i want with this page?

Some feed back would be great… even just pointing me in the correct method of doing it!

Hi, I have multiple cars at any one time and keeping track of the Regos on all the cars is a bit of a headache.

I have tried making a program that does this for me with me simply inputting the Rego, but I cannot get the results back.

import requests
from bs4 import BeautifulSoup


#url = 'https://httpbin.org/headers'
url = 'https://online.transport.wa.gov.au/webExternal/registration/?0-1.IBehaviorListener.1-layout-layout_body-' \
      'registrationRequestForm-searchButton&random=0.1450971120800384'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 '
                  'Safari/537.36 Edg/107.0.1418.62',
    'Accept-Language': 'en-GB,en;q=0.9,en-US;q=0.8',
    'Accept-Encoding': 'gzip, deflate, br',
    'Referer': 'https://online.transport.wa.gov.au/webExternal/css/styles_licensing.css',
    'Host': 'online.transport.wa.gov.au',
    'Connection': 'keep-alive',
    'Accept': 'image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8',
    'DNT': '1',
    'Cache-Control': 'no-cache',
    'Pragma': 'no-cache',
    'Cookie': 'JSESSIONID=70Pb5udz8enjqPcRk4OjOODjfORHr82eUe90pPSpitM83k2EWbRh!-1720409949!-1658794803; TS012ba7f5=0'
              '1becb1e5b6b9eda0a43a7a09fe67c8e8d893f7792d1d8270a95b4267bb1b1754adc3fc326aa6104de47ae36b87d71d4c1afa8f1'
              '73d2583279ddc291f0caef515f0f85c8ed',



}


value = {"plate": "1hdv242"}

r = requests.get(url, headers=headers, params=value)
#r = requests.post(url, headers=headers, params=value)


soup = BeautifulSoup(r.text, 'html.parser')

print(soup)

When I use the POST method, I can see the Rego in the html code of the site but I don’t get any results.

What am i doing wrong?

This is not a python question.
It is a question about the web app you are accessing.
I suspect that no one here knows about that web app hence the lack of response.

Well, I’ve just tried this. If you post to the site you get your form
back with the plate number prefilled in the form input field, that’s
all.

That’s because this form is heavily augumented with javascript which
does some kind of query-and-update to update the form contents inline. I
do not know how amenable it is to noninteractive use.

However, your form data are lacking some fields, and maybe the server
logic supporting the form is stupidly picky.

Locate every input tag in the form and make sure your value dict
contains a value for that input field, however pointless it may seem.
The dict keys are the name= fields from the form input tags.

The form itself commences:

 <form action="./?0-1.IFormSubmitListener-layout-layout_body-registrationRequestForm" id="id3" method="post" name="registrationRequestForm">

so it expects to be used as a POST.

Cookies values are transient, usually, so your cookie field might now be
invalid. It is possible that this matters.

You can get a current cookie by using the cookie jar facility of the
requests module, visiting the page with a GET to fill in the cookie,
then including that cookie in the POST.

Cheers,
Cameron Simpson cs@cskk.id.au

Hi Cameron thanks for the feed back i will try your suggestion and hopefully get something back!

It’s seems like I’m not sending the request to the right page. It gets sent to the search page but no further kind of thing.

Questions about using non-stdlib, 3rd party modules may get more attention on stackoverflow.com, especially if it has a tag (like ‘[python-requests]’) or on a module-specific list if there is one.

1 Like

Thanks Terry will post there as well :slight_smile:

import requests
import re
import pandas as pd

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0",
}


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        r1 = req.get(url)
        nurl = r1.url + "-1.IBehaviorListener.1-layout-layout_body-registrationRequestForm-searchButton="
        data = {
            "id3_hf_0": "",
            "plate": "1hdv243",
            "searchButton": 1
        }
        req.headers.update({
            'Wicket-Ajax': 'true',
            'Wicket-Ajax-BaseURL': '.'
        })
        r = req.post(nurl, data=data)
        match = url + re.search('(wicket.*?)]', r.text).group(1)
        r = req.get(match)
        df = pd.read_html(r.content, attrs={'class': 'registrationTable'})[0]
        print(df.T)


main('https://online.transport.wa.gov.au/webExternal/registration/')

For anyone in the future this code is an example of how to scrape a JS site, pasted as an answer to my question on stack overflow by Ahmed American