Designing an output capturer

Dear Python Experts.

There is a library that I run from my python script. This library dumps to the terminal text that I need to parse but the library does not offer a way to pick up the text, so I have to copy paste it myself in a text file.

I would like to build a mechanism that stores the text in some file that I can later parse, while also dumping it to the screen as it normally does. Chat GPT suggests something like:

import sys 
import time

class OutputCapturer:
    def __init__(self):
        self.content = ''

    def write(self, text):
        self.content += text
        sys.stdout = sys.__stdout__
        sys.stdout.write(text)
        sys.stdout = self 

capturer   = OutputCapturer()
sys.stdout = capturer

for i in range(5):
    print(f"Interation {i}")
    time.sleep(1)

sys.stdout = sys.__stdout__

so for every line that goes to the terminal I am moving back and forth between my OutputCapturer and the system stdout. Finally, I have to tell the system to restore the capture to the right capturer. This seems a little cumbersome. Is this the cleanest way to do what I need?

Cheers.

How exactly do you do this? The approach will be completely different for code that is imported and used, versus code run in a subprocess, for example.

Did you try to check the documentation for any alternative approaches to get the output?

Normally, something that works by “dumping text to the terminal” is not considered a “library”. The point of a “library” is to provide functionality that can return the information back to your code.

This code makes very little sense, and you should not ask ChatGPT to try to write code for you. I mean really you should not use any AI to try to write code for you (only perhaps to offer suggestions in an IDE), but ChatGPT is not even designed for the task while certain other AIs are.

2 Likes

Dear Karl,

Thank you for your answer

The code is imported

Yes, there does not seem to be any. The particular output does not get stored anywhere. We are trying to move away from that library, However we are under time constraints and within our time frame we have to work with what we have.

HAHAHAHAHA yes, I do not trust it, I always try to understand it first. I am testing it and the thing I pasted above seems to work, but I think there must be a better way of doing this.

Cheers.

It would help to know which library this is?

Off the top of my head, it’s easy to capture output from subprocesses, either combining or separating stdout and stderr, using subprocess.run.

Otherwise, to do it all in the same Python process I’d have a play with contextlib.redirect_stdout:

Even if it works, the code you showed is extremely poor code. The weird dance inside the write() method is bizarre and pointless, and liable to cause problems. ChatGPT is a terrible source of program code; you will do far better to write the code yourself than to try to massage its nonsense into something useful.

Hi James,

Thanks for your answer. The library is ROOT, it is a C++ library but also has python bindings. We are trying to get the importance table from training a machine learning algorithm with:

https://root.cern/doc/master/classTMVA_1_1Factory.html

the documentation is also not very good.

1 Like

What’s the return value from the call to TMVA::Factory::GetImportance? The C++ code is indeed printing some information to stdout. But from a first glance very similar information to this (and more) is returned. Maybe it’s lost via the Python binding. But any method named GetSomething, ought to return something.


      std::cout << "--- " << varNames[i - 1] << " = " << roc << " %" << std::endl;
      vih1->GetXaxis()->SetBinLabel(i, varNames[i - 1].Data());
      vih1->SetBinContent(i, roc);

      ...

      return vih1;
}

https://root.cern/doc/master/tmva_2tmva_2src_2Factory_8cxx_source.html#l02591

Hello James,

The return value is a pointer to a TH1 instance. These are instances of histograms, structures containing the value of something (bin content), for a specific value of something else (location of the bin). In this case, the function is taking the names of the features and the importances:

https://root.cern/doc/master/tmva_2tmva_2src_2Factory_8cxx_source.html#l02591

i.e. I need to have them already. The strange thing is that, this function seems to be reading the values from the C++ vectors, however they are not passed as const references (i.e. references that cannot be changed but only read) but instead the object itself is passed, making an unnecessary copy. Which does not tell me anything good about this code HAHAHAHAHA.

Cheers.

Sure, but what does the binding make that pointer into in Python?

I’m sure some users find ROOT to be great. But there are a lot of ML training frameworks out there.

In python I think this is just another TH1 object. When I check I see:

>>> import ROOT
>>>
>>> obj=ROOT.TH1F()
>>> 
>>> type(obj)
<class cppyy.gbl.TH1F at 0x563a087b01f0>
>>> 
>>> 

so it’s a cppyy.gbl thing that seems to be wrapping the actual histogram. I do not know how the wrapping/binding of C++ classes to make them work in python works.

What methods and properties has it got if dir(obj) is called?

It says:

>>> dir(h)
['AbstractMethod', 'Add', 'AddAt', 'AddBinContent', 'AddDirectory', 'AddDirectoryStatus', 'Adopt', 'AndersonDarlingTest', 'AppendPad', 'At', 'Browse', 'BufferEmpty', 'CanExtendAllAxes', 'CheckedHash', 'Chi2Test', 'Chi2TestX', 'Chisquare', 'Class', 'ClassName', 'Class_Name', 'Class_Version', 'Clear', 'ClearUnderflowAndOverflow', 'Clone', 'Compare', 'ComputeIntegral', 'Copy', 'DeclFileLine', 'DeclFileName', 'Delete', 'Dictionary', 'DirectoryAutoAdd', 'DistancetoLine', 'DistancetoPrimitive', 'Divide', 'Draw', 'DrawClass', 'DrawClone', 'DrawCopy', 'DrawNormalized', 'DrawPanel', 'Dump', 'Error', 'Eval', 'Execute', 'ExecuteEvent', 'ExtendAxis', 'FFT', 'Fatal', 'Fill', 'FillBuffer', 'FillN', 'FillRandom', 'FindBin', 'FindFirstBinAbove', 'FindFixBin', 'FindLastBinAbove', 'FindObject', 'Fit', 'FitOptionsMake', 'FitPanel', 'GetArray', 'GetAsymmetry', 'GetAt', 'GetAxisColor', 'GetBarOffset', 'GetBarWidth', 'GetBin', 'GetBinCenter', 'GetBinContent', 'GetBinError', 'GetBinErrorLow', 'GetBinErrorOption', 'GetBinErrorUp', 'GetBinLowEdge', 'GetBinWidth', 'GetBinWithContent', 'GetBinXYZ', 'GetBuffer', 'GetBufferLength', 'GetBufferSize', 'GetCellContent', 'GetCellError', 'GetCenter', 'GetContour', 'GetContourLevel', 'GetContourLevelPad', 'GetCumulative', 'GetDefaultBufferSize', 'GetDefaultSumw2', 'GetDimension', 'GetDirectory', 'GetDrawOption', 'GetDtorOnly', 'GetEffectiveEntries', 'GetEntries', 'GetFillColor', 'GetFillStyle', 'GetFunction', 'GetIconName', 'GetIntegral', 'GetKurtosis', 'GetLabelColor', 'GetLabelFont', 'GetLabelOffset', 'GetLabelSize', 'GetLineColor', 'GetLineStyle', 'GetLineWidth', 'GetListOfFunctions', 'GetLowEdge', 'GetMarkerColor', 'GetMarkerLineWidth', 'GetMarkerSize', 'GetMarkerStyle', 'GetMarkerStyleBase', 'GetMaximum', 'GetMaximumBin', 'GetMaximumStored', 'GetMean', 'GetMeanError', 'GetMinimum', 'GetMinimumAndMaximum', 'GetMinimumBin', 'GetMinimumStored', 'GetName', 'GetNbinsX', 'GetNbinsY', 'GetNbinsZ', 'GetNcells', 'GetNdivisions', 'GetNormFactor', 'GetObjectInfo', 'GetObjectStat', 'GetOption', 'GetPainter', 'GetQuantiles', 'GetRMS', 'GetRMSError', 'GetRandom', 'GetSize', 'GetSkewness', 'GetStatOverflows', 'GetStats', 'GetStdDev', 'GetStdDevError', 'GetSum', 'GetSumOfWeights', 'GetSumw2', 'GetSumw2N', 'GetTickLength', 'GetTitle', 'GetTitleFont', 'GetTitleOffset', 'GetTitleSize', 'GetUniqueID', 'GetXaxis', 'GetYaxis', 'GetZaxis', 'HandleTimer', 'HasInconsistentHash', 'Hash', 'ImplFileLine', 'ImplFileName', 'Info', 'InheritsFrom', 'Inspect', 'Integral', 'IntegralAndError', 'Interpolate', 'InvertBit', 'IsA', 'IsBinOverflow', 'IsBinUnderflow', 'IsDestructed', 'IsEqual', 'IsFolder', 'IsHighlight', 'IsOnHeap', 'IsSortable', 'IsTransparent', 'IsZombie', 'KolmogorovTest', 'LabelsDeflate', 'LabelsInflate', 'LabelsOption', 'MayNotUse', 'Merge', 'Modify', 'Multiply', 'Notify', 'Obsolete', 'Paint', 'Pop', 'Print', 'PutStats', 'Read', 'ReadArray', 'Rebin', 'RebinAxis', 'RebinX', 'Rebuild', 'RecursiveRemove', 'Reset', 'ResetAttFill', 'ResetAttLine', 'ResetAttMarker', 'ResetBit', 'ResetStats', 'SaveAs', 'SaveFillAttributes', 'SaveLineAttributes', 'SaveMarkerAttributes', 'SavePrimitive', 'Scale', 'Set', 'SetAt', 'SetAxisColor', 'SetAxisRange', 'SetBarOffset', 'SetBarWidth', 'SetBinContent', 'SetBinError', 'SetBinErrorOption', 'SetBins', 'SetBinsLength', 'SetBit', 'SetBuffer', 'SetCanExtend', 'SetCellContent', 'SetCellError', 'SetContent', 'SetContour', 'SetContourLevel', 'SetDefaultBufferSize', 'SetDefaultSumw2', 'SetDirectory', 'SetDrawOption', 'SetDtorOnly', 'SetEntries', 'SetError', 'SetFillAttributes', 'SetFillColor', 'SetFillColorAlpha', 'SetFillStyle', 'SetHighlight', 'SetLabelColor', 'SetLabelFont', 'SetLabelOffset', 'SetLabelSize', 'SetLineAttributes', 'SetLineColor', 'SetLineColorAlpha', 'SetLineStyle', 'SetLineWidth', 'SetMarkerAttributes', 'SetMarkerColor', 'SetMarkerColorAlpha', 'SetMarkerSize', 'SetMarkerStyle', 'SetMaximum', 'SetMinimum', 'SetName', 'SetNameTitle', 'SetNdivisions', 'SetNormFactor', 'SetObjectStat', 'SetOption', 'SetStatOverflows', 'SetStats', 'SetTickLength', 'SetTitle', 'SetTitleFont', 'SetTitleOffset', 'SetTitleSize', 'SetUniqueID', 'SetXTitle', 'SetYTitle', 'SetZTitle', 'ShowBackground', 'ShowMembers', 'ShowPeaks', 'Sizeof', 'Smooth', 'SmoothArray', 'StatOverflows', 'Streamer', 'StreamerNVirtual', 'Sumw2', 'SysError', 'TestBit', 'TestBits', 'TransformHisto', 'UseCurrentStyle', 'Warning', 'Write', 'WriteArray', '__add__', '__assign__', '__bool__', '__class__', '__contains__', '__delattr__', '__destruct__', '__dict__', '__dir__', '__dispatch__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__imul__', '__init__', '__init_subclass__', '__invert__', '__le__', '__len__', '__lt__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__pos__', '__python_owns__', '__radd__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__', '__rsub__', '__rtruediv__', '__setattr__', '__setitem__', '__sizeof__', '__smartptr__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__weakref__', '_getitem__unchecked', 'fArray', 'fN', 'kAllAxes', 'kAutoBinPTwo', 'kBitMask', 'kCanDelete', 'kCannotPick', 'kConsider', 'kHasUUID', 'kIgnore', 'kInconsistent', 'kInvalidObject', 'kIsAverage', 'kIsHighlight', 'kIsNotW', 'kIsOnHeap', 'kIsReferenced', 'kIsZoomed', 'kLogX', 'kMustCleanup', 'kNeutral', 'kNoAxis', 'kNoContextMenu', 'kNoStats', 'kNoTitle', 'kNormal', 'kNotDeleted', 'kNstat', 'kObjInCanvas', 'kOverwrite', 'kPoisson', 'kPoisson2', 'kSingleKey', 'kUserContour', 'kWriteDelete', 'kXaxis', 'kYaxis', 'kZaxis', 'kZombie', 'ls']

Grand. What does .GetBinContent() return, or h.GetBin() ?

In this case, I would get the importance. However, to get the importance, I would need to call the function with the importance as an argument. So the function is not really giving the importance, but the importance histogramed. i guess even the name is bad, it should be called GetImportanceHistogram() instead.