I would like to write a library for processing a simple language. The parser level syntax I see will be simple enough that packages like lark are definitely overkill, but that leaves the lexical side. I really really don’t want to reinvent the wheel specifying regexps for floating point numbers, for instance. I had the clever idea of piggy backing on Python’s own lexical syntax, by using the tokenize
standard library module - but unfortunately that won’t work because tokenize
enforces indentation rules, and my language should not treat white space as significant. A shame, because I’d be perfectly happy with all my tokens exactly matching Python’s.
Among the many lexer generator packages on pypi, are there any that provide some pre-made definitions for more complex tokens like floats?