I have a text file having 10000 log lines like
2022-12-27T00:00:00+00:00 VM1 sshd[25690]: pam_unix(sshd:session): session closed for user .
Main tasks are-
- Filter only those lines having-[‘unauthorized’,‘error’,‘kernel error’,‘OS error’,‘rejected’,‘warning’,“error”] these words .
- Split the lines into different parts and store this data in a Dataframe using Apache Beam
I am using apache Beam version 2.44.0 and coding in Jupyter Notebook.