Robust Financial Crime Detection in Big Data via Uncertainty-Aware Deep Learning Techniques
Date Issued
February 2021
Author(s)
Advisor
Abstract
Taxation is one of the most important sources of revenue for the European Union
and Value Added Tax (VAT) accounts [1] to EUR 1,2T and as such it is prevalent
target for tax evasion. The European commission has estimated the difference
between the estimated and collected VAT (VAT GAP) to be EUR 147B or 12.3%
of the VAT revenue [2].
It is unfortunate that many EU Tax departments rely on outdated technology
like rules-based systems to target high-yield taxpayers for audit in their effort
to decrease the VAT GAP. In addition, the absence of research in state of the art
technology by the Tax Departments is surprising, meaning that they have not
benefited from advancements in intelligent systems.
This thesis draws inspiration from the most recent machine learning advances
in areas like visual recognition and speech perception. We seek to introduce cutting
edge technology in the tax departments arsenal against tax evasion. Specifically,
we target the selection of high-yield taxpayers for audit. In our work, we
rely on intelligently processed raw data obtained from available tax returns. The
high-dimensional nature of the available data calls for the development of machine
learning techniques that can learn to extract meaningful lower-dimensional
representations to drive the predictive inference process. We address these needs
in a comprehensive manner, yielding a novel a novel set of supervised and semisupervised
techniques. In all cases, we take special care mitigating the epistemic uncertainty our problem is fraught with, as a result of the limited number of audited
(labelled) data.
The success of this thesis would not have been possible without the wholeheartedly
assistance of the Cyprus Tax Department and the inspired mentoring of the
Taxation Commissioner Mr Yiannis Tsangaris. Specifically, with their approval,
we were given anonymized access to over a million submitted VAT returns and
the tax audit results, pertaining to the period 2013-2019. This availability of a
large corpus of real-world data was a crucial factor that allowed for us to successfully
pursue our research goals.
and Value Added Tax (VAT) accounts [1] to EUR 1,2T and as such it is prevalent
target for tax evasion. The European commission has estimated the difference
between the estimated and collected VAT (VAT GAP) to be EUR 147B or 12.3%
of the VAT revenue [2].
It is unfortunate that many EU Tax departments rely on outdated technology
like rules-based systems to target high-yield taxpayers for audit in their effort
to decrease the VAT GAP. In addition, the absence of research in state of the art
technology by the Tax Departments is surprising, meaning that they have not
benefited from advancements in intelligent systems.
This thesis draws inspiration from the most recent machine learning advances
in areas like visual recognition and speech perception. We seek to introduce cutting
edge technology in the tax departments arsenal against tax evasion. Specifically,
we target the selection of high-yield taxpayers for audit. In our work, we
rely on intelligently processed raw data obtained from available tax returns. The
high-dimensional nature of the available data calls for the development of machine
learning techniques that can learn to extract meaningful lower-dimensional
representations to drive the predictive inference process. We address these needs
in a comprehensive manner, yielding a novel a novel set of supervised and semisupervised
techniques. In all cases, we take special care mitigating the epistemic uncertainty our problem is fraught with, as a result of the limited number of audited
(labelled) data.
The success of this thesis would not have been possible without the wholeheartedly
assistance of the Cyprus Tax Department and the inspired mentoring of the
Taxation Commissioner Mr Yiannis Tsangaris. Specifically, with their approval,
we were given anonymized access to over a million submitted VAT returns and
the tax audit results, pertaining to the period 2013-2019. This availability of a
large corpus of real-world data was a crucial factor that allowed for us to successfully
pursue our research goals.
File(s)![Thumbnail Image]()
![Thumbnail Image]()
Name
Robust Financial Crime Detection in Big Data via Uncertainty-Aware.pdf
Size
699.96 KB
Format
Adobe PDF
Checksum (MD5)
456cbd84cb04b46216bd3966d0d9ceea
Name
Robust Financial Crime Detection in Big Data via Uncertainty-Aware - Abstract.pdf
Size
143.2 KB
Format
Adobe PDF
Checksum (MD5)
63f8d2c7b54be53a43b66bdabeb69b77

