ISSN 2353-6977 (Online)

DETECTION OF SOURCE CODE IN INTERNET TEXTS USING AUTOMATICALLY GENERATED MACHINE LEARNING MODELS

Marcin BADUROWICZ

In the paper, the authors are presenting the outcome of web scraping software allowing for the automated classification of source code. The software system was prepared for a discussion forum for software developers to find fragments of source code that were published without marking them as code snippets. The analyzer software is using a Machine Learning binary classification model for differentiating between a programming language source code and highly technical text about software. The analyzer model was prepared using the AutoML subsystem without human intervention and fine-tuning and its accuracy in a described problem exceeds 95%. The analyzer based on the automatically generated model has been deployed and after the first year of continuous operation, its False Positive Rate is less than 3%. The similar process may be introduced in document management in software development process, where automatic tagging and search for code or pseudo-code may be useful for archiving purposes.

+ - FULL TEXT Click to collapse

Download article

+ - HOW TO CITE THIS PAPER Click to collapse

APA 7th style

Badurowicz, M. (2022). Detection of source code in internet texts using automatically generated machine learning models. Applied Computer Science, 18(1), 89-98. https://doi.org/10.23743/acs-2022-07

Chicago style

Badurowicz, Marcin. "Detection of Source Code in Internet Texts Using Automatically Generated Machine Learning Models." Applied Computer Science 18, no. 1 (2022): 89-98.

IEEE style

M. Badurowicz, "Detection of source code in internet texts using automatically generated machine learning models," Applied Computer Science, vol. 18, no. 1, pp. 89-98, 2022, doi: 10.23743/acs-2022-07.

Vancouver style

Badurowicz M. Detection of source code in internet texts using automatically generated machine learning models. Applied Computer Science. 2022;18(1):89-98.

< Prev		Next >

ISSN 2353-6977 (Online)

DETECTION OF SOURCE CODE IN INTERNET TEXTS USING AUTOMATICALLY GENERATED MACHINE LEARNING MODELS

News

Submit

Time of publication