

klib.missingval_plot(df) # returns a figure containing information about missing values # klib.clean - functions for cleaning datasets klib.dist_plot(df) # returns a distribution plot for every numeric feature rr_plot(df) # returns a color-encoded heatmap, ideal for correlations rr_mat(df) # returns a color-encoded correlation matrix klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features # scribe - functions for visualizing datasets
KLIB LIBRARY PYTHON INSTALL
Use the package manager pip to install klib.Īlternatively, to install this package with conda run: Explanations on key functionalities can be found on Medium / TowardsDataScience in the examples section or on YouTube (Data Professor). Use the package manager pip to install klib.Īlternatively, to install this package with conda run: Klib is a Python library for importing, cleaning, analyzing and preprocessing data.Klib is a Python library for importing, cleaning, analyzing and preprocessing data. # scribe - functions for visualizing datasets klib is a Python library for importing, cleaning, analyzing and preprocessing data. Installation Use the package manager pip to install klib.Įxplanations on key functionalities can be found on Medium / TowardsDataScience in the examples section or on YouTube (Data Professor). rr_mat(df) # returns a color-encoded correlation matrix klib.cat_plot(df) # returns a visualization of the number and frequency of categorical features Klib python install#

rr_plot(df) # returns a color-encoded heatmap, ideal for correlations A scalar or sequence of n numbers to be mapped to colors using cmap and norm. A 2D array in which the rows are RGB or RGBA. Klib.data_cleaning(df) # performs datacleaning (drop duplicates & empty rows/cols, adjust dtypes.) klib.missingval_plot(df) # returns a figure containing information about missing values # klib.clean - functions for cleaning datasets klib.dist_plot(df) # returns a distribution plot for every numeric feature Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. klib.mv_col_handling(df) # drops features with high ratio of missing vals based on informational content klib.drop_missing(df) # drops missing values, also called in data_cleaning() nvert_datatypes(df) # converts existing to more efficient dtypes, also called inside data_cleaning() klib.clean_column_names(df) # cleans and standardizes column names, also called inside data_cleaning() Klib.pool_duplicate_subsets(df) # pools subset of cols based on duplicates with min.

klib.missingval_plot(df) # default representation of missing values in a DataFrame, plenty of settings are available loss of information Examplesįind all available examples as well as applications of the functions in klib.clean() with detailed descriptions here. rr_plot(df, target= 'wine') # default representation of correlations with the feature column rr_plot(df, split= 'neg') # displaying only negative correlations rr_plot(df, split= 'pos') # displaying only positive correlations, other settings include threshold, cmap. Klib.dist_plot(df) # default representation of a distribution plot, other settings include fill_range, histogram. Klib.cat_plot(data, top= 4, bottom= 4) # representation of the 4 most & least common values in each categorical columnįurther examples, as well as applications of the functions in klib.clean() can be found here. Pull requests and ideas, especially for further functions are welcome. For major changes or feedback, please open an issue first to discuss what you would like to change.
