Tabula Rasa: Why Do Tree-Based Algorithms Outperform Neural Networks
Artificial intelligence has made great strides in recent years. ChatGPT has stunned the world. Yet although we have seen incredible applications for both images and text, tabular data is still a problem. ( So much so that Kadra in 2021 called them the “unconquered castle” for neural networks). But why do they remain a problem?
This article is divided into different sections. For each section, we will answer these questions:
Tabular data can be defined as a sub-branch of structured data. Simply put, tabular data can be defined as any data that can be described as a table (like an Excel sheet) in which by convention the rows represent examples and the columns represent features.
Paradoxically, despite their simplicity, in real-world applications, most data are in tabular format: finance, medicine, climate science, and manufacturing.
In contrast to images, text, or audio (called homogeneous data because they have only one type of feature), tabular are data that are heterogeneous, since they can contain multiple types of features:
0 Comments