Xiuwen · @__icoder__
110 followers · 465 posts · Server sfba.social

Learning from and .

Recipe for permutating or randomly reordering the rows of a DataFrame or Series:
new_order = np.random.permutation(n)
df.iloc[new_order]
df.take(new_order)
To permutate the cols of a DataFrame, add "axis='columns'" to .take().

Method for selecting a random subset of the rows DataFrame or Series:
df.sample(n=, frac=)
To allow for replacement, add "replace=True" to .sample().

#effectivepandas #pythonfordataanalysis #learnpython #ProgressToday

Last updated 2 years ago

Xiuwen · @__icoder__
114 followers · 519 posts · Server sfba.social

Learning from and .

Recipe for permutating or randomly reordering the rows of a DataFrame or Series:
new_order = np.random.permutation(n)
df.iloc[new_order]
df.take(new_order)
To permutate the cols of a DataFrame, add "axis='columns'" to .take().

Method for selecting a random subset of the rows DataFrame or Series:
df.sample(n=, frac=)
To allow for replacement, add "replace=True" to .sample().

#effectivepandas #pythonfordataanalysis #learnpython #ProgressToday

Last updated 2 years ago

Xiuwen · @__icoder__
114 followers · 519 posts · Server sfba.social

@treyhunner I continued working through 2 books on pandas: and , and wrote a few toots as my notes

#effectivepandas #pythonfordataanalysis

Last updated 2 years ago

Xiuwen · @__icoder__
114 followers · 519 posts · Server sfba.social

Learning from and .

The preferred way to index and filter a Series or a DataFrame is i) with .loc[] indexing on index labels or ii) with .iloc[] indexing on index position integers. Their call signatures are nearly identical:

.loc[rows]
.loc[:, cols]
.loc[rows, cols]

Their strengths come from the increased clarity what we intend to index on and what we intend to select, therefore helping us not be the problem 😂

#effectivepandas #pythonfordataanalysis #learnpython #ProgressToday

Last updated 2 years ago

Xiuwen · @__icoder__
96 followers · 398 posts · Server sfba.social

Continued my way through and .

Element-wise transformation of a Series values or an Index labels can be done by feeding a dictionary (for selected elements) or a function (for all elements) into method

.map(dict or func)

Binning can be done with methods

.cut(data, bins or nbins, right=, labels=, precision=)
.qcut(data, quantiles or nquartiles)

.cut() bins the data values, while .qcut() bins the data quantiles.

#ProgressToday #effectivepandas #pythonfordataanalysis #learnpython

Last updated 2 years ago

Xiuwen · @__icoder__
114 followers · 519 posts · Server sfba.social

Continued my way through and .

Element-wise transformation of a Series values or an Index labels can be done by feeding a dictionary (for selected elements) or a function (for all elements) into method

.map(dict or func)

Binning of a Series or column can be done with i) the data values, or ii) the data quantiles:

.cut(data, bins or nbins, right=, labels=, precision=)
.qcut(data, quantiles or nquartiles)

#ProgressToday #effectivepandas #pythonfordataanalysis #learnpython

Last updated 2 years ago

Xiuwen · @__icoder__
90 followers · 358 posts · Server sfba.social

Finished the sections in and on converting the data types of a Series or column. Top methods:

.astype(dtype, copy=, errors=)
.convert_dtypes()
pd.to_datetime()
pd.CategoricalDtype(categories=, ordered=)

The first is a general-purpose one for Python and NumPy types, while the second converts to pandas extension types that support NA.

Before converting data types, be sure to take care of codes for missing data or errors

#ProgressToday #effectivepandas #pythonfordataanalysis

Last updated 2 years ago

Xiuwen · @__icoder__
89 followers · 347 posts · Server sfba.social

Finished going through sections in and related to duplicated data and cleaning. It's good that the two important methods apply to all three objects - Series, DataFrame, and Index:
.duplicated(subset=, keep=)
.drop_duplicates(subset=, keep=)
One difference is that the kwarg 'subset=' applies to DataFrame objects only, which can have multiple columns to choose from.

#ProgressToday #effectivepandas #pythonfordataanalysis

Last updated 2 years ago

Xiuwen · @__icoder__
88 followers · 341 posts · Server sfba.social

Finished going over sections in and related to handling missing data. Here are useful functions on this topic:
.isna()
.notna()
.dropna(how=, thresh=, axis=)
.fillna(value=, method=, limit=, axis=)
.interpolate(method=, limit=, axis=)

#ProgressToday #effectivepandas #pythonfordataanalysis

Last updated 2 years ago