Python using BeautifulSoup

Hi, I’m very new to Python and have written a program to scrape for baseball player bio data.

I made a soup object and extracted all the data to a list. I expect it to have 7 items in each record. Some of them don’t have all 7 items filled in so my list is not coming out correctly. I have been manually adding to the list to get all the fields filled in. The problem is that in the original list it has the HTML tags with “text”. I can’t get the additions to have the same
format. IE

My soup argument is:
my_list2 = soup.find_all('span', class_='player-detail')
it gives me 209 elements and I'm expecting 210
So I added the element using:
[my_list2.insert(209, '<span class="player-detail"> no signer'), 

The problem I'm having is that the original data in the list is like this:
[<span class="player-detail"> Aug 14, 1998 </span>,

My added element comes out in the list like this:
  , '<span class="player-detail"> no signer',

How can I get it to not be enclosed in '   '  so that it is recognized like all the other 
elements.

Thanks and I’m sure this is a very newbie question
GMD

The soup.find_all() method returns a list (technically, it’s a bs4.element.ResultSet object, but it acts like a list) of bs4.element.Tag objects. You’re trying to insert a string into this list, which is why your inserted element has the surrounding quotes. Instead, you must insert another bs4.element.Tag object. To do this, you must use the string to create a bs4.element.Tag object and then insert the object into the list. Here’s an example of how to do this.

import bs4
soup = bs4.BeautifulSoup('<html><body><span class="player-detail">this is a span</span></body></html>')
my_list = soup.find_all('span', class_='player-detail')
item_to_append = bs4.BeautifulSoup('<span class="player-detail">this is another span</span>').span
my_list.append(item_to_append)

I’m probably not explaining it right. The soup item I started with is read directly from the website. It didn’t include enough information. So how do a make something up and get it inserted in there? The “item_to_append” line in your response looks like it would go back to the same website to get more information which isn’t there.

Thanks so much for helping

In the example above, the item_to_append variable does represent an element that you can make up. In this line, you can make up an HTML string containing a <span> element and pass it to the bs4.BeautifulSoup() function. This then gives you a bs4.element.Tag object, which you can then combine with the list you obtained from what was read directly from the website.

Thanks again for your help. Now that I understand what you were saying I got it to work.
I’m sure that if you saw all the code there is a much better way, but for my first attempt I’m pleased.

Thank you again for bailing me out!!
GMD

Hi, I have a question about the code I mentioned above. When I create the my_list object Ideally it would have the 7 items in it for each player. In some cases if there isn’t any data the website just doesn’t put a field there. So instead of having 7 elements it may only have 6. So when it builds my list it is short some data. Is there a way to have it leave a list space blank or fill it with a “no data” so that each time it reads a players record it has 7 data items even if some of them are missing from the website?

Thanks
GMD

You could use the value None to indicate that there is no data. The following code would extend the length of my_list to 7 by appending the value None as many times as necessary.

if len(my_list) < 7:
    my_list.extend([None] * (7 - len(my_list)))

Adang,

When I read in the bs4.object “my_list” if it has all the data in there would be 210 items (30 players with 7 items each. When I read in the items on the webpage it says 209. I don’t know without looking through the data which player is missing data. Ideally what I want is when I read in the bs4 object it would replace a missing item with “none” so that it keeps the integrity of the list. Here is copy of some of the data and how it is on the website. This one is missing data

<div class="player-details">
       <div class="player-field">
        <span class="player-label">
         Born:
        </span>
        <span class="player-detail">
         Nov 12, 2003
        </span>
       </div>
       <div class="player-field">
        <span class="player-label">
         Bats:
        </span>
        <span class="player-detail">
         R
        </span>
        <span class="player-label">
         Throws:
        </span>
        <span class="player-detail">
         R
        </span>
       </div>
       <div class="player-field">
        <span class="player-label">
         Ht.:
        </span>
        <span class="player-detail">
         6'2"
        </span>
        <span class="player-label">
         Wt.:
        </span>
        <span class="player-detail">
         175
        </span>
       </div>
       <div class="player-field">
        <span class="player-label">
         Drafted/Signed:
        </span>
        <span class="player-detail">
         Dominican Republic, 2021.
        </span>
       </div>

This one is missing:
   <span class="player-label">Signed by: </span> and the corresponding 
  <span class="player-detail"> John Scout </span>

Thanks again for your help
GMD