Generative Documentation of Data Lineage and ETL Logic

Authors

  • Vasudevan Ananthakrishnan Yakshna Solutions Inc, USA Author
  • Bhaskar Yakkanti MGM Resorts, USA Author
  • Sai Charan Ponnoju Sai Charan Ponnoju Author

Keywords:

data lineage, ETL logic, generative documentation, large language models, code parsing, onboarding efficiency, data governance

Abstract

Businesses data governance is still a time-intensive, error-prone endeavour for data lineage and Extract-Transform-Load (ETL) logic which is still an integral part. The objective of this paper is to introduce Large language models (LLMs) trained on SQL, procedural ETL frameworks, and metadata schemas that can automatically produce flowcharts, semantic data dictionaries, and logic walkthroughs from ETL source code repositories. 

Downloads

Download data is not yet available.

References

M. H. Böhm, M. A. Hernández, and R. J. Miller, "Data lineage for scientific workflows," Proc. IEEE Int. Conf. on Data Engineering (ICDE), pp. 585–592, 2010.

L. Moreau et al., "The open provenance model core specification (v1.1)," Future Gener. Comput. Syst., vol. 27, no. 6, pp. 743–756, 2011.

J. Cheney, L. Chiticariu, and W.-C. Tan, "Provenance in databases: Why, how, and where," Found. Trends Databases, vol. 1, no. 4, pp. 379–474, 2009.

H. V. Jagadish et al., "Making database systems usable," Commun. ACM, vol. 54, no. 1, pp. 109–118, 2011.

D. J. Abadi, "Data management in the cloud: Limitations and opportunities," IEEE Data Eng. Bull., vol. 33, no. 1, pp. 3–12, 2010.

A. Deutsch, L. Popa, and V. Tannen, "Query optimization with provenance," Proc. ACM SIGMOD Int. Conf. on Management of Data, pp. 1391–1394, 2011.

A. Marcus et al., "Neural code summarization: Leveraging program structure to summarize code," arXiv preprint arXiv:1802.06178, 2018.

I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," Advances in Neural Information Processing Systems (NIPS), pp. 3104–3112, 2014.

D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," arXiv preprint arXiv:1409.0473, 2014.

Y. Kim, "Convolutional neural networks for sentence classification," Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751, 2014.

S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

T. Mikolov et al., "Distributed representations of words and phrases and their compositionality," Advances in Neural Information Processing Systems (NIPS), pp. 3111–3119, 2013.

A. Vaswani et al., "Attention is all you need," arXiv preprint arXiv:1706.03762, 2017.

J. Wang et al., "Automated documentation generation for SQL scripts," Proc. IEEE Int. Conf. on Software Maintenance and Evolution (ICSME), pp. 240–249, 2017.

K. Kontogiannis, "Software architectural slicing," Proc. IEEE Int. Conf. on Software Maintenance (ICSM), pp. 25–34, 1997.

T. Sato et al., "ETL process mining based on workflow logs," Proc. IEEE Int. Conf. on Data Engineering Workshops (ICDEW), pp. 175–180, 2016.

R. Abreu, "Static analysis techniques for source code summarization," Journal of Software: Evolution and Process, vol. 29, no. 9, 2017.

P. J. Guo, "Code, comments, and documentation in open source projects," Proc. ACM SIGSOFT Int. Symp. on Foundations of Software Engineering, 2016.

S. McIlraith et al., "Knowledge graphs and their role in AI," Commun. ACM, vol. 61, no. 9, pp. 50–59, 2018.

R. C. Holte, "Very simple classification rules perform well on most commonly used datasets," Machine Learning, vol. 11, no. 1, pp. 63–91, 1993.

Downloads

Published

07-10-2018

How to Cite

[1]
Vasudevan Ananthakrishnan, Bhaskar Yakkanti, and Sai Charan Ponnoju, “Generative Documentation of Data Lineage and ETL Logic”, Art. Intel. Mach. Learn. Auto. Sys., vol. 2, p. 127, Oct. 2018, Accessed: May 23, 2026. [Online]. Available: https://amlas.net/index.php/publication/article/view/23

Similar Articles

1-10 of 50

You may also start an advanced similarity search for this article.