@@ -39,3 +39,14 @@ The processing group is divided into 3 parts:
* Data output - sends data to storage. In future version data will also be sent to other tools doing real time stream processing of the data.
Each group contains a process group called "Custom ..." where it is possible to add new processors to the pipeline that will not be overwritten when upgrading to newer versions of SOCTools.
## Performance
The two components that decides the performance of SOCTools are Elasticsearch and Apache NiFi. Both components are highly scalable by adding more nodes to the cluster.
There are reports of NiFi being scaled to handle petabytes of data per day in a large cluster, [Processing one billion events per second with NiFi](https://blog.cloudera.com/benchmarking-nifi-performance-and-scalability/). The performance of NiFi depends heavily on the type and number of processors in the pipeline. The enrichment pipeline used in SOCTools is quite CPU intensive but it utilizes flow record processing in Nifi which means that multiple log entries of the same type are grouped together to improve performance.
Uninett is using [Humio](https://www.humio.com/) instead of Elasticsearch for storing logs, but has a pilot installation of Apache Nifi running the same pipeline as the one in SOCTools. The current setup is 6 virtual servers running on 4 physical servers. The HW specification of the virtual servers are:
* CPU: 12 cores
* Memory: 8GB
* Disk: 40GB
This setup processes around 7K events per second of production data per second during peak hours. During performance testing we have been able to add an additional 17K events per second of test traffic before NiFi starting to show performance issues. This translates to more than 1.1TB of data per day.