Learning from #EffectivePandas and #PythonForDataAnalysis.
Recipe for permutating or randomly reordering the rows of a DataFrame or Series:
new_order = np.random.permutation(n)
df.iloc[new_order]
df.take(new_order)
To permutate the cols of a DataFrame, add "axis='columns'" to .take().
Method for selecting a random subset of the rows DataFrame or Series:
df.sample(n=, frac=)
To allow for replacement, add "replace=True" to .sample().
#effectivepandas #pythonfordataanalysis #learnpython #ProgressToday
Learning from #EffectivePandas and #PythonForDataAnalysis.
Recipe for permutating or randomly reordering the rows of a DataFrame or Series:
new_order = np.random.permutation(n)
df.iloc[new_order]
df.take(new_order)
To permutate the cols of a DataFrame, add "axis='columns'" to .take().
Method for selecting a random subset of the rows DataFrame or Series:
df.sample(n=, frac=)
To allow for replacement, add "replace=True" to .sample().
#effectivepandas #pythonfordataanalysis #learnpython #ProgressToday
@treyhunner I continued working through 2 books on pandas: #EffectivePandas and #PythonForDataAnalysis, and wrote a few toots as my notes
#effectivepandas #pythonfordataanalysis
Learning from #EffectivePandas and #PythonForDataAnalysis.
The preferred way to index and filter a Series or a DataFrame is i) with .loc[] indexing on index labels or ii) with .iloc[] indexing on index position integers. Their call signatures are nearly identical:
.loc[rows]
.loc[:, cols]
.loc[rows, cols]
Their strengths come from the increased clarity what we intend to index on and what we intend to select, therefore helping us not be the problem 😂
#effectivepandas #pythonfordataanalysis #learnpython #ProgressToday
#ProgressToday Continued my way through #EffectivePandas and #PythonForDataAnalysis.
Element-wise transformation of a Series values or an Index labels can be done by feeding a dictionary (for selected elements) or a function (for all elements) into method
.map(dict or func)
Binning can be done with methods
.cut(data, bins or nbins, right=, labels=, precision=)
.qcut(data, quantiles or nquartiles)
.cut() bins the data values, while .qcut() bins the data quantiles.
#ProgressToday #effectivepandas #pythonfordataanalysis #learnpython
Continued my way through #EffectivePandas and #PythonForDataAnalysis.
Element-wise transformation of a Series values or an Index labels can be done by feeding a dictionary (for selected elements) or a function (for all elements) into method
.map(dict or func)
Binning of a Series or column can be done with i) the data values, or ii) the data quantiles:
.cut(data, bins or nbins, right=, labels=, precision=)
.qcut(data, quantiles or nquartiles)
#ProgressToday #effectivepandas #pythonfordataanalysis #learnpython
#ProgressToday Finished the sections in #EffectivePandas and #PythonForDataAnalysis on converting the data types of a Series or column. Top methods:
.astype(dtype, copy=, errors=)
.convert_dtypes()
pd.to_datetime()
pd.CategoricalDtype(categories=, ordered=)
The first is a general-purpose one for Python and NumPy types, while the second converts to pandas extension types that support NA.
Before converting data types, be sure to take care of codes for missing data or errors
#ProgressToday #effectivepandas #pythonfordataanalysis
#ProgressToday Finished going through sections in #EffectivePandas and #PythonForDataAnalysis related to duplicated data and cleaning. It's good that the two important methods apply to all three objects - Series, DataFrame, and Index:
.duplicated(subset=, keep=)
.drop_duplicates(subset=, keep=)
One difference is that the kwarg 'subset=' applies to DataFrame objects only, which can have multiple columns to choose from.
#ProgressToday #effectivepandas #pythonfordataanalysis
#ProgressToday Finished going over sections in #EffectivePandas and #PythonForDataAnalysis related to handling missing data. Here are useful functions on this topic:
.isna()
.notna()
.dropna(how=, thresh=, axis=)
.fillna(value=, method=, limit=, axis=)
.interpolate(method=, limit=, axis=)
#ProgressToday #effectivepandas #pythonfordataanalysis