0

Encoding Categorical Data for Outlier Detection

https://towardsdatascience.com/encoding-categorical-data-for-outlier-detection/(towardsdatascience.com)
Encoding categorical data is a crucial step for outlier detection, as most algorithms require purely numeric input to calculate distances between data points. While one-hot encoding is a common approach, it struggles with high-cardinality features by creating an excessive number of new columns. The piece contrasts encoding methods for prediction with those for unsupervised outlier detection, where target columns are unavailable. Count encoding is presented as a strong alternative to one-hot encoding, especially for high-cardinality data, as it converts categories into a single numeric column based on their frequency.
0 pointsby will222 hours ago

Comments (0)

No comments yet. Be the first to comment!

Want to join the discussion?