In this work, we immerse ourselves in the fundamental challenges of evaluating the text2SQL solutions and we highlight the possible causes of failure and the potential risks of depending on the metrics added at the existing reference points. We identify two limitations largely not addressed at the current open points: (1) Data quality problems in the evaluation data mainly attributed to the lack of capturing the probabilistic nature of translating a description of the natural language in a structured consultation (for example, the ambiguity of NL) and (2) the sias that uses different functions of coincidence as approximations for the equivalence of SQL. To put both limitations in context, we propose a unified taxonomy over all text2SQL limitations that can lead to prediction and evaluation errors. Then we motivate the taxonomy by providing a survey of text2SQL limitations using solutions and state -of -the -art reference points. We describe causes of limitations with real world examples and propose possible mitigation solutions for each of the categories in taxonomy. We conclude highlighting the open challenges by implementing such mitigation strategies or trying to automatically apply the taxonomy in all categories.
† University of Waterloo