Splitting a string dynamically

@mlgtechuser

Yes I’ll take step back and think about how to explain again.

The source data is from an XML, which I parse and I write the tags to a list. I then loop through the list and get the strings like I’ve previously posted. These contain expressions , formulas or whatever you want to call them. I then have to reformat to write to another XML with a differing syntax to be used by a different application.

I don’t want my script to evaluate these expressions merely rewrite if needed , so they are treated as strings.
There are numerous permutations and I can handle them all bar this range /nonrange combination.

@cheesebird Feel free to use my “specification” of your expressions. Copy it, edit it, show it here.
Edit: To access the source of my post, select the text and press “Reply”. Then you can remove the reply tags.

Ron, Vacláv is valiantly trying to reverse engineer a spec on your source data. Can you throw us a bone and provide a spec?

He’s doing it because the missing data encoding spec is a glaring omission that’s fouling any real progress.

Anything we reverse engineer will involve so much guessing that it’s sure to have faults and incorrect assumptions–and probably significant ones.

Engineers find this sort of challenge to be quite fun, as you can see, but it’s no substitute for the real specification on your source data and its encoding system.

In reply to what’s be asked here: if you can’t provide the information for reasons of propriety, then the only other option that I can see is to follow the guide lines here…

@vbrozik

Thanks for the post. To clarify I am not trying to evaluate these expressions in the python script.

These expressions are parsed from a tag in XML#1 which is used on system1. I take these strings containing the expressions, reformat and write to XML#2 for system 2 , which requires a different format.

All I am trying to do is reformat a string. The operators are simply characters in a string in this case.The expressions will be evaluated on system2 .

There 10s of operators such as hex2dec which I can handle in in my conversion code and that is not the issue.

I only the have range / nonrange string combination holding me back. So this is a pure string manipulation scenario.

That was a good high-level summary of the overall situation. However…

I mostly type backticks [```] (grave accent) for code blocks and [`] inline monospace and asterisks for italic and bold. On my Discourse account at meta.discoirse.com I had to click something at top right of the editor’s frame to reveal the editing toolbar. It has hyperlinking and some other handy functions.

Like...

…this folding text block that I should probably use more often.

You have a gift for understatement, Ross. :smile:

2 Likes

@cheesebird In general to change something in an expression, you need to understand the context you are changing in. To understand the context, you usually need to parse the expression.

No one is talking about full evaluation of the expressions. We even do not know where are the values of the (still unconfirmed :smiley: ) variables.

So, do you have a specification for the source XML file data, Ross?

What does the file’s header say about the system that generated it? There’s probably some data format breadcrumbs or something…

Maybe this will help if I show the desired output from the original post strings.

string1 = "(FS22 > 15) && (FS22 < 46) || (FS33 > 0.0)"
string2 =  "(FS33 > 0.0) || (FS99> 15) && (FS99 < 46) || (FS38 > 0.0)"
string3  =   "(ES33 > 0.0) || (ES34> 5) && (ES34 < 16) || (EZ99 > 0.0)  && (ES39> 15)"

String1 should be split like this …

FS22 <>15:46
FS32 > 0.0

String2 expected output

FS33 > 0.0
FS99 <>15:46
FS38  > 0.0

String3 expected output…

ES33 > 0.0
ES34 <> 5:16
EZ99 > 0.0
ES39 >15

All split up nicely to be written to another XML tag.

The range expression is denoted by having a duplicate substring i.e FS22 in string 1.

Note we can’t take over && or || operators for system2 as they are not used hence the splitting requirements.

That’s a significant part of what I asked for. Thank you.

Someone designed the data structure and decided what the && and || means. They probably wrote their design down. What do we know about who that was and where they wrote down the design specifics?

I’ll have to check but on system2 we cannot use && or || in the XML’s that is why I need to split by the substring alpha numeric value.

That’s fine. We have to know clearly and positively what || and && ARE in the encoding scheme in order to predict where they can and cannot show up in the strings. This is the STEP 1 in my post in the other topic.

(10 < DD99 && DD99 < 15) && (DD88 > 20) &&
( > 15 DD67 && DD67 18 <) || (DD11 <= 15)

Here’s a stab at decoding the encoding…

  1. There are three types of data unit:
    – individual simple parenthetical: (VAR > 3)
    – individual complex parenthetical: (VAR > 3 && VAR < 5)
    – compound parenthetical clauses made up of two or more(?) parentheticals:
    (VAR1 = 3) && (VAR2 > 1)

  2. && and || are ‘AND’ and ‘OR’ operators. They can appear in two types of location:
    – between clauses
    – between repeated variables inside of parenthetical clauses.
    || only appears between parentheticals (does not appear inside of them).

  3. Each parenthetical clause has one or more, max 2(?), numeric values, which can be integer or floating-point decimal.

  4. The clauses contain one or more, max 2(?), standard comparison operators and two operands. The two operands may be combined with AND or OR (&& or ||).

  5. Parenthetical clauses are single-level (not nested).
    __________
    That’s as far as I’m willing to go without more (and complete) strings to evaluate or definition of the expression elements. The above is only observations of very limited examples and already involves way too much guessing and assumption.

Both these operators should be dropped as we can’t use on system2.

So it’s basically one alphanumeric value , operator and value per line.

You’re correct though && is and || is or.

All I need to do is split the string and don’t care if it’s and or or.

Only difficulty is range statement in a nonrange string.

Not quite a true statement… I get that we can ignore && and || in the output string assembly. Nevertheless, you need to split the string in a very intelligent way so that you can reassemble it for use in a specific encoding system. I assume XML file 2 is being read and processed by a computer program, so that program is probably going to have some very fixed ideas about what it expects to see. Am I right?

That is 100% correct. Once I break down these original strings I simply feed the values into the new tags for XML2 format.

It only needs what I have already posted like this…

DD33 > 10
DD88 > 50
DD23 <> 10:20

My code does the rest.

Already parsed/ generated 100’s of these strings but stuck on this last one.

Well done on the progress. You’ve applied some pretty sophisticated technologies to this project.

“If at all” refers to fully working. You probably appreciate at this point that an incomplete solution is not really a solution.

1 Like

Is this omitted space in 34> a typo?

There are some examples of this happening in original strings but I do a broad replace to avoid this

string1.replace('>' , ' > ')

So this one in my example is a typo.

I also couldn’t find a way to search for such occurrences and then only replace if needed. Spent many happy hours battling with regex on this.

Are you looking at the number of white spaces?