diff --git a/doc/architecture.md b/doc/architecture.md index e11d95db0f9d6c1fe4bd75df581c342bc013e512..dcfdb98eb70146d111c9888cab0a70c56153acc2 100644 --- a/doc/architecture.md +++ b/doc/architecture.md @@ -8,7 +8,7 @@ SOCTools is a collection of tools for collecting, enriching and analyzing logs a The high level architecture is shown in the figure above and consists of the following components: * Data sources - the platform supports data from many common sources like system logs, application logs, IDS etc. It is also simple to add support for other sources. The main method for sending data into SOCTools is through Filebeat. * High volume data sources - while the main platform is able to scale to high traffic volumes, it will in some cases be more convenient to have a separate setup for very high volume data like Netflow. Some NRENs might also have an existing setup for this kind of data that they do not want to change. Data sources like this will have its own storage system. If real time processing is done on the data, alerts from this can be shipped to other components in the architecture. -* Data transport - [Apache Nifi](https://nifi.apache.org/), the key component that collects data from data sources, normalize it, do simple data enrichment and then ship it to one or more of the other components in the architecture. +* Data transport - [Apache Nifi](https://nifi.apache.org/) is the key component that collects data from data sources, normalize it, do simple data enrichment and then ship it to one or more of the other components in the architecture. * Storage - in the current version all storage is done in [Elasiticsearch](https://opendistro.github.io/for-elasticsearch/), but it is easy to make changes to the data transport so that data is sent to other log analysis tools like Splunk or Humio. * Manual analysis - In the current version [Kibana](https://opendistro.github.io/for-elasticsearch/) is used for manual analysis of collected data. * Enrichment - This component enriches the collected data either before or after storage. In the current version this is done as part of the data transport component before data is sent to storage. @@ -33,7 +33,7 @@ This process group is basically a collection of "cron jobs" that runs regularly * Misp - NiFi automatically downloads new IOCs from the Misp instance that is part of SOCTools. IP addresses and host names are then enriched to show if they are registered in Misp. ### Data processing -The processing group is divided into 3 parts: +The processing group is split into 3 parts: * Data input - receives data, normalizes it and converts it to JSON. This also adds attributes to the data that specifies which filed names to enrich. * Enrichment - enriches the data. It currently supports enriching IP addresses, domain names and fully qualified domain name (FQDN). * Data output - sends data to storage. In future version data will also be sent to other tools doing real time stream processing of the data.