Hi, I have an assignment due next week which I am very stuck on and I am hoping for guidance. I am stuck on preprocessing data so if I can’t clear this up I will not progress far with this assignment.

I have a column in my data frame app_df[‘Size’] which is an obj data type. The brief is:

- Size column has sizes in Kb as well as Mb. To analyze, you’ll need to convert these to numeric.
- Extract the numeric value from the column
- Multiply the value by 1,000, if size is mentioned in Mb

I have tried

app_df[‘Size’] = app_df[‘Size’].astype(‘int’)

## and get the following error which I don’t understand. I can’t get the data into the right format I won’t be able to extract the numerical data to be able to be able to perform the right mathematical operations?

ValueError Traceback (most recent call last)

~\AppData\Local\Temp/ipykernel_22780/4109999247.py in

----> 1 app_df[‘Size’] = app_df[‘Size’].astype(int)

~\anaconda3\lib\site-packages\pandas\core\generic.py in astype(self, dtype, copy, errors)

5813 else:

5814 # else, only a single dtype is given

→ 5815 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)

5816 return self._constructor(new_data).**finalize**(self, method=“astype”)

5817

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in astype(self, dtype, copy, errors)

416

417 def astype(self: T, dtype, copy: bool = False, errors: str = “raise”) → T:

→ 418 return self.apply(“astype”, dtype=dtype, copy=copy, errors=errors)

419

420 def convert(

~\anaconda3\lib\site-packages\pandas\core\internals\managers.py in apply(self, f, align_keys, ignore_failures, **kwargs)

325 applied = b.apply(f, **kwargs)

326 else:

→ 327 applied = getattr(b, f)(**kwargs)

328 except (TypeError, NotImplementedError):

329 if not ignore_failures:

~\anaconda3\lib\site-packages\pandas\core\internals\blocks.py in astype(self, dtype, copy, errors)

589 values = self.values

590

→ 591 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)

592

593 new_values = maybe_coerce_values(new_values)

~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_array_safe(values, dtype, copy, errors)

1307

1308 try:

→ 1309 new_values = astype_array(values, dtype, copy=copy)

1310 except (ValueError, TypeError):

1311 # e.g. astype_nansafe can fail on object-dtype of strings

~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_array(values, dtype, copy)

1255

1256 else:

→ 1257 values = astype_nansafe(values, dtype, copy=copy)

1258

1259 # in pandas we don’t store numpy str dtypes, so convert to object

~\anaconda3\lib\site-packages\pandas\core\dtypes\cast.py in astype_nansafe(arr, dtype, copy, skipna)

1172 # work around NumPy brokenness, #1987

1173 if np.issubdtype(dtype.type, np.integer):

→ 1174 return lib.astype_intsafe(arr, dtype)

1175

1176 # if we have a datetime/timedelta array of objects

~\anaconda3\lib\site-packages\pandas_libs\lib.pyx in pandas._libs.lib.astype_intsafe()

ValueError: invalid literal for int() with base 10: ‘19M’