:::info Authors:
(1) Mark Potanin, a Corresponding ([email protected]);
(2) Andrey Chertok, ([email protected]);
(3) Konstantin Zorin, ([email protected]);
(4) Cyril Shtabtsovsky, ([email protected]).
:::
Table of Links3 Dataset Overview, Preprocessing, and Features
3.1 Successful Companies Dataset and 3.2 Unsuccessful Companies Dataset
4 Model Training, Evaluation, and Portfolio Simulation and 4.1 Backtest
5 Other approaches
5.2 Founders ranking model and 5.3 Unicorn recommendation model
7 Further Research, References and Appendix
3 Dataset Overview, Preprocessing, and FeaturesWe used daily Crunchbase database export (Daily CSV Export) as the primary data source, which is also supported by a well-documented API. The main goal of this research was to collect a labeled dataset for training a deep learning model to classify companies as either successful or unsuccessful.
\ The analysis was based on the Daily CSV Export from 2022-06-14, and only companies established on or after 2000-01-01 were taken into account. To refine the focus of the research, only companies within specific categories were included, such as Software, Internet Services, Hardware, Information Technology, Media and Entertainment, Commerce and Shopping, Mobile, Data and Analytics, Financial Services, Sales and Marketing, Apps, Advertising, Artificial Intelligence, Professional Services, Privacy and Security, Video, Content and Publishing, Design, Payments, Gaming, Messaging and Telecommunications, Music and Audio, Platforms, Education, and Lending and Investments.
\ This research is focused on investment rounds occurring after round B. However, in the Crunchbase data glossary, rounds such as seriesunknown, privateequity, and undisclosed, possess unclear characteristics. To incorporate them into the company’s funding round history, we only included these ambiguous rounds if they occurred after round B.
\
:::info This paper is available on arxiv under CC 4.0 license.
:::
\
All Rights Reserved. Copyright , Central Coast Communications, Inc.