Hi, I have an assignment due next week which I am very stuck on and I am hoping for guidance. I am stuck on preprocessing data so if I can’t clear this up I will not progress far with this assignment.
I have a column in my data frame app_df[‘Size’] which is an obj data type. The brief is:
- Size column has sizes in Kb as well as Mb. To analyze, you’ll need to convert these to numeric.
- Extract the numeric value from the column
- Multiply the value by 1,000, if size is mentioned in Mb
I have tried
app_df[‘Size’] = app_df[‘Size’].astype(‘int’)
and get the following error which I don’t understand. I can’t get the data into the right format I won’t be able to extract the numerical data to be able to be able to perform the right mathematical operations?
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_22780/4109999247.py in
----> 1 app_df[‘Size’] = app_df[‘Size’].astype(int)
~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)
5813 else:
5814 # else, only a single dtype is given
→ 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
5816 return self._constructor(new_data).finalize(self, method=“astype”)
5817
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)
416
417 def astype(self: T, dtype, copy: bool = False, errors: str = “raise”) → T:
→ 418 return self.apply(“astype”, dtype=dtype, copy=copy, errors=errors)
419
420 def convert(
~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)
325 applied = b.apply(f, **kwargs)
326 else:
→ 327 applied = getattr(b, f)(**kwargs)
328 except (TypeError, NotImplementedError):
329 if not ignore_failures:
~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)
589 values = self.values
590
→ 591 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
592
593 new_values = maybe_coerce_values(new_values)
~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors)
1307
1308 try:
→ 1309 new_values = astype_array(values, dtype, copy=copy)
1310 except (ValueError, TypeError):
1311 # e.g. astype_nansafe can fail on object-dtype of strings
~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy)
1255
1256 else:
→ 1257 values = astype_nansafe(values, dtype, copy=copy)
1258
1259 # in pandas we don’t store numpy str dtypes, so convert to object
~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)
1172 # work around NumPy brokenness, #1987
1173 if np.issubdtype(dtype.type, np.integer):
→ 1174 return lib.astype_intsafe(arr, dtype)
1175
1176 # if we have a datetime/timedelta array of objects
~\anaconda3\lib\site-packages\pandas_libs\lib.pyx in pandas._libs.lib.astype_intsafe()
ValueError: invalid literal for int() with base 10: ‘19M’