Question about string sequencing and slicing

Hi

I’ve just begun learning how to code in python and have come across string sequencing and slicing. I’m having trouble understanding the concept of the 0. For example, If I were to sequence the string “Michael Jackson” the python would assign the M=0 I=1 C=2 H=3 A=4 and so and so forth.

My issue is when I try to slice this sequence. I’ll assign it to Name=“Michael Jackson” and then try to slice it by typing in Name[0:4] this is where I’m getting lost, instead of returning MICHA it returns MICH. Now, am I overthinking this and the program will only access the characters between 0 and 4 and call it a day there, or is it supposed to return all the characters up to and including the 4th sequence?

Thanks in advance

This may help you to better understand what’s going on

| M | i | c | h | a | e | l |   | J | a | c | k | s | o | n |

| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14|

So, [0:4] returns a substring that includes the character from the index 0 (included) to 4 (excluded).

The syntax for slicing is as follows:

string[start:end]

The substring always includes the character at the start and excludes the character at the end.

The start and end are optional. If you omit the start, it defaults to zero. If you omit the end, it defaults to the string’s length.

4 Likes

That makes sense, thanks for you help!

2 Likes

You are very welcome.

The two rules that might be tripping you up are:

  1. Python starts counting from 0, not 1.
  2. Slices are “half-open”, so the starting position is included and the ending position is excluded.

For example, let’s look at your string ‘Michael’. I’m going to use the character | to show the positions where a slice custs the string:

The indices are:

0 1 2 3 4 5 6 7
>M>i>c>h>a>e>l>

When you take the slice 'Michael'[0:4] Python cuts the string at the marked | at positions 0 and 4, which slices the string before the M and after the h (or before the a). And so the slice of the string you receive back is ‘Mich’.

Notice that this “half-open” rule makes it easy to work out how many characters you get back, by subtracting the start position from the end position: 4 - 0 gives us 4, and sure enough, ‘Mich’ has four characters.

So the slice [2:5] would have 5-2 = 3 characters and cut at the boundaries 2 and 5, giving ‘cha’. Try it and see if that is correct.

Can you correctly predict what the slice [1:6] returns? Hint: it has 6-1 = 5 characters.

If you remember that slicing positions are between characters, it becomes much easier to reason about slicing.

2 Likes

Thank you for taking the time to explain. it was the second point that had me confused, I understood what it was doing but was having trouble with the why. As Rob and yourself mentioned it’s because it doesn’t count the ending position.

Just another quick question if you don’t mind.

I wanted to understand why python combines str(‘1+2’) and returns it as ‘1+2’ instead of returning ‘3’.

Thank you

For the same reason that "Michael " + "Jackson" returns "Michael Jackson": you’re concatenating two strings.

If you want to perform an arithmetic operation, you’ll need a different data type, such as an integer.

>>> "Michael " + "Jackson"
'Michael Jackson'

>>> 1+2
3
2 Likes

You have '1+2' which is a string because it is inside quotation marks ''.

Calling str() on a string doesn’t change it. A string is already a string.

Try leaving out the quotation marks:

'1+2'  # a string with the characters 1, +, 2
1+2    # no quotes --> an arithmetic expression

You don’t need to turn things into a string if you want to see the result:

print(1 + 2)  # prints 3
2 Likes

Ah that makes sense! that “Michael” + “Jackson” example drove the point home completely!
Thanks for clarity once again!

oh I see, the str in that example would render it moot be cause they were already defined by the quotation marks.

Can I assume that this is an absolute rule? or are there cases where it will run it as an arithmetic expression?

Thank you for the help!

You mean if a string can be interpreted as an arithmetic expression, right?

Yes, it can. Python has the function eval() which evaluates a string as a Python expression:

>>> eval('1+2')
3

Important note: In a production code never give a not sanitized string which can (even partially) come from an outer source[1] to eval(). This leads to one of the most common type of security bugs - injection. Note that it could be pretty difficult to correctly sanitize a string.


Note that in Python not all expressions are arithmetic (taking numbers and giving a number as a result). In most of Python programs most of the expressions would not be arithmetic. Examples:

'hello'                # expression with result of type str
"Michael" + "Jackson"  # again expression with result of type str
print('hello')         # expression with result None
                       # Printing of the text "hello" is a side-effect of print()
list(range(10))        # expression with result of type list
                       # ...but with this one we can work as a vector or a 1D matrix

  1. e.g. user input, configuration file etc. ↩︎

3 Likes

Not a worry: in making said point, I trust that I did not stab you with it.

As @vbrozik points out, sanitizing any user input is very important, and can be complex.

You may find this…

… to be of value.

2 Likes

If you put characters inside quotation marks '' or "", they will always be treated as a string.

There are two, not so much exceptions as wrinkles, on that:

  1. the eval() and exec() functions expect to be given a string, and they run it as Python code. That may include evaluating arithmetic expressions.
  2. so-called “f-strings” are a special short-cut for evaluating code written as a string and returning a string.

Even in these two cases, using quotation marks creates a string, so in that sense we can say there are no exceptions.

You should ignore both of these advanced features until you have a better understanding of the basic features of Python.

2 Likes

@vbrozik

I see. I’ll be making a note of that as I progress onto more complicated features. When you mention the that we can work the list(range(10)) command as a vector, I’m assuming its’ the same thing as a matrix that you mention and that they aren’t mutually exclusive.

@rob42 Not at all. Cleared up the confusion perfectly. Thank you for the link too, I’ll definitely be making use of it.

@steven.daprano

Yes, I will be making a note of those functions and revisit them when I am more confident in the fundamentals. I have to say, I’ve never been more excited to learn more in my life, who knew it could be this fun eh?

Thanks for your help!

1 Like

Yes, I meant that it depends on how you interpret the data. List is an ordered collection of objects. When we put numbers into a list - for example 3 numbers:

>>> list(range(3))
[0, 1, 2]

we can interpret the list as a vector in 3D space or as a 1-dimensional array containing 3 numbers.

I think I misused the word matrix. It seems matrices are always 2-dimensional in the sense that they have columns and rows.

The code with range() was just an example. Normally you would not initialize a vector using the range() function. You would put there the values you need:

vector = [2.5, 1.2, 6.7]

If you need to do computations with vectors and matrices, check the library NumPy.

1 Like

Hi Rob.

I’ve been reading through the w3schools site and have recently learned about for and while loops. My question though, is regarding while loops. In one of the examples on w3 they provide the following
i = 1
while i < 6:
print(i)
i += 1

Now all of this makes sense to me, but I can’t seem to wrap my head around the last i+=1 statement. From my understanding it functions as an increment like the one used in the range function. But why is it written this way?

I hope I’m not bothering you with these questions.
Thank you

No bother at all.

The i += 1 is conmanly called ‘syntactical sugar’: it’s shorthand (if you will) for i = i + 1 Or if you know C code, inc i or increment i one 1

In the future, please start a new topic when you are asking a question unrelated to the existing topic. This has nothing to do with string sequencing or slicing.

To answer your question, it is written as i += 1 because the person who wrote the code wanted to write it that way, and because they wanted to show you how to use a while loop. That is all.

i += 1 is just a shortcut for i = i + 1.

(There are some slight differences when the variable is a list or tuple, or similar, but they aren’t important here.)

Oh right, so it’s just shorthand for i=i+1. Gotcha

@steven.daprano

Sorry, i’ll keep that in mind for future questions.

thank you