Buggana Sathvik, Lini Thomas, Kamal Karlapalem
Document Structure Analysis is a syntactic analysis problem where different sections of the document are extracted and parsed to study the relationship between the tokens present. There are around 48 regulatory documents available in the SEBI database. Each of these documents contain around 25 regulations in average applicable for a specific financial sector document-wise, like Mutual Funds, Investment Advisers, etc. These regulations are further grouped into chapters of specific topic in focus. In all the documents, the first chapter provides an introductory idea about the content and the second chapter consists of the definitions of certain financial words used in the documents. The rest of the chapters consist of the regulations in the documents.
Around 5500 regulations were extracted from the latestSEBI regulatory documents. These regulations follow an in-trinsic pattern of the form Entity-Condition-Action, as seen intable I. The entity is the subject of the regulation. Conditionincludes the phrase that the entity must satisfy in orderto perform the action. Action is the rest of the regulationthat follows the rule-relevant words, ’shall’ or ’may’.
Template fitting aims to extract and restructure the regulations by using the Entity-Condition-Action parts of a regulation
Data: HERE