Difficulty on comprehending a block of code

Hi folks,

Me as a beginner on learning python, i am having a setback on understanding the code below. The code is a part of a game called “Hangman”. This part of the code is about taking a user input as a word guess and checking if that guess is a single letter and nothing else. I am all OK up to the statement:

guess <= “9”

but this is where i am tackled. The thing i dont understand here is how come a string letter can be bigger or smaller than another character. is it possible? as i know, only numbers can be bigger or smaller than each other. as for letters, never heard of it. thanks for the helpers in advance

here the code

guess = input("This is the Hangman Word: " + display + " Enter your guess: \n")
    guess = guess.strip()
    if len(guess.strip()) == 0 or len(guess.strip()) >= 2 or guess <= "9":
        print("Invalid Input, Try a letter\n")

It seems (to me) to be a very crude attempt at catching input that is not alpha: I think (but I’m sure and hope, that I’ll be corrected if I’m wrong) that the expression guess <= "9" is referencing the ASCII value and as the alpha characters have a ASCII value that is greater than “9” (look up the ASCII values to see this) it will catch numbers, but it does not catch all none alpha characters.

To add…

As I have a script that does this, I can save you the bother:

DEC 	 CHR 	   BIN 	 	 HEX
32  	    	 00100000 	 20
33  	  ! 	 00100001 	 21
34  	  " 	 00100010 	 22
35  	  # 	 00100011 	 23
36  	  $ 	 00100100 	 24
37  	  % 	 00100101 	 25
38  	  & 	 00100110 	 26
39  	  ' 	 00100111 	 27
40  	  ( 	 00101000 	 28
41  	  ) 	 00101001 	 29
42  	  * 	 00101010 	 2a
43  	  + 	 00101011 	 2b
44  	  , 	 00101100 	 2c
45  	  - 	 00101101 	 2d
46  	  . 	 00101110 	 2e
47  	  / 	 00101111 	 2f
48  	  0 	 00110000 	 30
49  	  1 	 00110001 	 31
50  	  2 	 00110010 	 32
51  	  3 	 00110011 	 33
52  	  4 	 00110100 	 34
53  	  5 	 00110101 	 35
54  	  6 	 00110110 	 36
55  	  7 	 00110111 	 37
56  	  8 	 00111000 	 38
57  	  9 	 00111001 	 39
58  	  : 	 00111010 	 3a
59  	  ; 	 00111011 	 3b
60  	  < 	 00111100 	 3c
61  	  = 	 00111101 	 3d
62  	  > 	 00111110 	 3e
63  	  ? 	 00111111 	 3f
64  	  @ 	 01000000 	 40
65  	  A 	 01000001 	 41
66  	  B 	 01000010 	 42
67  	  C 	 01000011 	 43
68  	  D 	 01000100 	 44
69  	  E 	 01000101 	 45
70  	  F 	 01000110 	 46
71  	  G 	 01000111 	 47
72  	  H 	 01001000 	 48
73  	  I 	 01001001 	 49
74  	  J 	 01001010 	 4a
75  	  K 	 01001011 	 4b
76  	  L 	 01001100 	 4c
77  	  M 	 01001101 	 4d
78  	  N 	 01001110 	 4e
79  	  O 	 01001111 	 4f
80  	  P 	 01010000 	 50
81  	  Q 	 01010001 	 51
82  	  R 	 01010010 	 52
83  	  S 	 01010011 	 53
84  	  T 	 01010100 	 54
85  	  U 	 01010101 	 55
86  	  V 	 01010110 	 56
87  	  W 	 01010111 	 57
88  	  X 	 01011000 	 58
89  	  Y 	 01011001 	 59
90  	  Z 	 01011010 	 5a
91  	  [ 	 01011011 	 5b
92  	  \ 	 01011100 	 5c
93  	  ] 	 01011101 	 5d
94  	  ^ 	 01011110 	 5e
95  	  _ 	 01011111 	 5f
96  	  ` 	 01100000 	 60
97  	  a 	 01100001 	 61
98  	  b 	 01100010 	 62
99  	  c 	 01100011 	 63
100  	  d 	 01100100 	 64
101  	  e 	 01100101 	 65
102  	  f 	 01100110 	 66
103  	  g 	 01100111 	 67
104  	  h 	 01101000 	 68
105  	  i 	 01101001 	 69
106  	  j 	 01101010 	 6a
107  	  k 	 01101011 	 6b
108  	  l 	 01101100 	 6c
109  	  m 	 01101101 	 6d
110  	  n 	 01101110 	 6e
111  	  o 	 01101111 	 6f
112  	  p 	 01110000 	 70
113  	  q 	 01110001 	 71
114  	  r 	 01110010 	 72
115  	  s 	 01110011 	 73
116  	  t 	 01110100 	 74
117  	  u 	 01110101 	 75
118  	  v 	 01110110 	 76
119  	  w 	 01110111 	 77
120  	  x 	 01111000 	 78
121  	  y 	 01111001 	 79
122  	  z 	 01111010 	 7a
123  	  { 	 01111011 	 7b
124  	  | 	 01111100 	 7c
125  	  } 	 01111101 	 7d
126  	  ~ 	 01111110 	 7e

One alternative, based on the code you posted is this:

guess = input (f"This is the Hangman Word: {display} Enter your guess: ").lower().strip()

if len(guess) != 1 or guess < "a" or guess > "z":
    print("Invalid Input. Try a letter\n")

Yes, it is possible. Strings (not just individual characters) are ordered according to their alphabetical order (at least for English words).

Every character has an ordinal value:

>>> ord('a')
97
>>> ord('b')
98

The ordinal values range from 0 to 1114111, but most of them are unused. If you are just using English words and punctuation, you will get ordinal values between (approximately) 32 and 128 or so.

When comparing two individual characters, the character with the lowest ordinal value comes first, so 'a' < 'b'.

When comparing two strings, they are compared one pair of characters at a time, like dictionary order. 'cat' < 'cattle' just as ‘cat’ comes before ‘cattle’ in the dictionary.

If you are writing in English, you should find that string comparisons match dictionary order. I think that the same applies to Greek or Russian strings. Other languages may not be so fortunate.

By the way, you can go the other way using chr: chr(97) will return the letter 'a'.

Indeed. The Icelandic ð (ETH) e.g. is collated (means “is ordered”) after d, but has the Unicode byte value u00F0. Does not come between the values for d and e. On a Unix/Linux system LC-COLLATE governs the order of string sorts.

>>> ord("c")
99
>>> ord("d")
100
>>> ord("\u00F0")
240

I don’t think that affects Python sorting though, only the external Unix sorting tools.

You are right again. I am disappointed:

>>> a = "afghzni\u0133qw" 
>>> sorted(a)
['a', 'f', 'g', 'h', 'i', 'n', 'q', 'w', 'z', 'ij']

ij should be sorted before z :disappointed:.
Because:

>>> a = "\u0133zer"
>>> b = "zonder"
>>> a < b
False

… and that is not correct.

It did not sit well with me, the answer I gave to your message. So I looked further. An apology to the core devs is required, because the solution is in the locale module.
To recap:

>>> import locale
>>> a = "\u0133zer"
>>> b = "zonder"
>>> sorted([a, b])
['zonder', 'ijzer']

… which is the wrong order.
But:

>>> import functools as fun
>>> sorted([a, b], key=fun.cmp_to_key(locale.strcoll))
['ijzer', 'zonder']

… is correct, only a bit long winded.

2 Likes

Remember to first set the default collation locale via locale.setlocale(locale.LC_COLLATE, "").

locale.strcoll() is implemented by C wcscoll(), which depends on the current LC_COLLATE category. Python sets the LC_CTYPE category to the configured default, but other categories should initially be the “C” locale. In this case, I’d expect collation by ordinal. That’s what I get in both Windows and Linux, in which “zonder” sorts before “ijzer”, until the LC_COLLATE category is set to the configured default.

2 Likes

Thank you for completing this.