diff --git a/README.md b/README.md index 28d8e3457c5a520ce6ff7824afa5e77da8e18fbf..c0d2c7f9c6ec8737b720d64da054c1b04c19d582 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,12 @@ 6. [Requirements Hardware](#requirements-hardware) 7. [Requirements Software](#requirements-software) 8. [HOWTO Install and Configure](#howto-install-and-configure) - * [Install Python 3.8.x](#install-python-38x) + * [Install Python 3.9.x](#install-python-39x) + [CentOS 7 requirements](#centos-7-requirements) + [Debian requirements](#debian-requirements) - + [Python 3.8](#python-38) + + [Python 3.9](#python-39) 9. [Install the Chromedriver](#install-the-chromedriver) -10. [Install Chromium needed by Selenium](#install-chromium-needed-by-selenium) +10. [Install Google Chrome needed by Selenium](#install-google-chrome-needed-by-selenium) 11. [ECCS2 Script](#eccs2-script) * [Install](#install) * [Configure](#configure) @@ -84,12 +84,15 @@ The tool uses following status for IdPs: * HDD: 10 GB * RAM: 4 GB * CPU: >= 2 vCPU (suggested) +* ARCH: 64 Bit # Requirements Software * Apache Server + WSGI -* Python 3.8 (tested with v3.8.3,v3.8.5) -* Selenim + Chromium Web Brower +* Python 3.9 (tested with v3.9.6) +* Selenim + Google Chrome Web Brower (tested with v91.0.4472.164) +* Chromedriver (tested with v91.0.4472.101) +* Git # HOWTO Install and Configure @@ -97,7 +100,7 @@ The tool uses following status for IdPs: * `cd $HOME ; git clone https://github.com/malavolti/eccs2.git` -## Install Python 3.8.x +## Install Python 3.9.x ### CentOS 7 requirements @@ -110,6 +113,9 @@ The tool uses following status for IdPs: 3. Install needed packages to build python: * `sudo yum -y install openssl-devel bzip2-devel libffi-devel wget` +4. Install Git: + * `sudo yum -y install git` + ### Debian requirements 1. Update the system packages: @@ -118,45 +124,49 @@ The tool uses following status for IdPs: 2. Install needed packages to build python: * `sudo apt install build-essential zlib1g-dev libncurses5-dev libgdbm-dev libnss3-dev libssl-dev libreadline-dev libffi-dev` -### Python 3.8 +3. Install Git: + * `sudo apt install git` + +### Python 3.9 -1. Download the last version of Python 3.8.x from https://www.python.org/downloads/source/ into your home: - * `wget https://www.python.org/ftp/python/3.8.5/Python-3.8.5.tgz -O $HOME/eccs2/Python-3.8.5.tgz` +1. Download the last version of Python 3.9.x from https://www.python.org/downloads/source/ into your home: + * `wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tgz -O $HOME/eccs2/Python-3.9.6.tgz` 2. Extract Python source package: * `cd $HOME/eccs2/` - * `tar xzf Python-3.8.5.tgz` + * `tar xzf Python-3.9.6.tgz` 3. Build Python from the source package: - * `cd $HOME/eccs2/Python-3.8.5` + * `cd $HOME/eccs2/Python-3.9.6` * `./configure --prefix=$HOME/eccs2/python` * `make` -4. Install Python 3.8.x under `$HOME/eccs2/python`: +4. Install Python 3.9.x under `$HOME/eccs2/python`: * `make install` - * `$HOME/eccs2/python/bin/python3.8 --version` + * `$HOME/eccs2/python/bin/python3.9 --version` This will install python under your $HOME directory. 5. Remove useless things: - * `rm -Rf $HOME/eccs2/Python-3.8.5 $HOME/eccs2/Python-3.8.5.tgz` + * `rm -Rf $HOME/eccs2/Python-3.9.9 $HOME/eccs2/Python-3.9.6.tgz` -# Install Chromium needed by Selenium +# Install Google Chrome needed by Selenium -* Debian: - * `sudo apt install chromium git jq` +* Debian (64 bit): + * `sudo wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb` + * `sudo apt install ./google-chrome-stable_current_amd64.deb` -* CentOS: - * `sudo yum install -y epel-release` - * `sudo yum install -y chromium git jq` +* CentOS (64 bit): + * `sudo wget https://dl.google.com/linux/direct/google-chrome-stable_current_x86_64.rpm` + * `sudo yum install ./google-chrome-stable_current_x86_64.rpm` # Install the Chromedriver 1. Find out which version of Chromium you are using: * Debian 9 (stretch): - * `chromium -version` => Chromium 73.0.3683.75 + * `google-chrome -version` => Google Chrome 91.0.4472.164 * CentOS 7.8: - * `chromium-browser -version` => Chromium 83.0.4103.116 + * `google-chrome -version` => Google Chrome 91.0.4472.164 2. Take the Chrome version number, remove the last part, and append the result to URL "`https://chromedriver.storage.googleapis.com/LATEST_RELEASE_`". For example, with Chrome version 73.0.3683.75, you'd get a URL "`https://chromedriver.storage.googleapis.com/LATEST_RELEASE_73.0.3683`". @@ -178,8 +188,8 @@ After the initial download, it is recommended that you occasionally go through t ## Install * `cd $HOME/eccs2` -* `./python/bin/python3.8 -m pip install virtualenv` -* `$HOME/eccs2/python/bin/virtualenv --python=$HOME/eccs2/python/bin/python3.8 eccs2venv` +* `./python/bin/python3.9 -m pip install virtualenv` +* `$HOME/eccs2/python/bin/virtualenv --python=$HOME/eccs2/python/bin/python3.9 eccs2venv` * `source eccs2venv/bin/activate` (`deactivate` to exit Virtualenv) * `python -m pip install -r requirements.txt` @@ -228,11 +238,11 @@ After the initial download, it is recommended that you occasionally go through t * `./runEccs2.py --idp <IDP-ENTITYID>` (to run check on a single IdP) * `./runEccs2.py --test` (to run a full check without effects) * `./runEccs2.py --idp <IDP-ENTITYID> --test` (to run check on a single IdP without effects) + * `./runEccs2.py --idp <IDP-ENTITYID> --replace` (to run check on a single IdP and replace, or add, a result) - The check will run a second time for those IdPs that failed the first execution of the script. If something prevent the good execution of the ECCS2's check, the `logs/failed-cmd.sh` file will be not empty at the end of the execution. - The "--test" parameter will not change the result of ECCS2, but will write the output on the `logs/stdout_idp_YYYY-MM-DD.log`,`logs/stderr_idp_YYYY-MM-DD.log` and `logs/failed-cmd-idp.sh` files. + The "--test" parameter will not change the result of ECCS2, but will write the output on the `logs/stdout_idp_YYYY-MM-DD.log`,`logs/stderr_idp_YYYY-MM-DD.log` and `logs/failed-cmd-idp.sh` files if the argument "--test" will be used. # ECCS2 API Server (uWSGI) diff --git a/api.py b/api.py index 58d0c65006854b12e8f8fccf435c75ff5e6899b4..32c478304f5a5a65bdefb6bf2810b8c1cb06e6c8 100755 --- a/api.py +++ b/api.py @@ -7,7 +7,7 @@ import re from eccs2properties import DAY, ECCS2LOGSDIR, ECCS2OUTPUTDIR, ECCS2LISTFEDSURL, ECCS2LISTFEDSFILE from flask import Flask, request, jsonify from flask_restful import Resource, Api -from utils import getLogger, getListFeds, getRegAuthDict +from utils import get_logger, get_list_feds, get_reg_auth_dict app = Flask(__name__) api = Api(app) @@ -175,8 +175,8 @@ class EccsResults(Resource): # /api/fedstats class FedStats(Resource): def get(self): - list_feds = getListFeds(ECCS2LISTFEDSURL, ECCS2LISTFEDSFILE) - regAuthDict = getRegAuthDict(list_feds) + list_feds = get_list_feds(ECCS2LISTFEDSURL, ECCS2LISTFEDSFILE) + regAuthDict = get_reg_auth_dict(list_feds) file_path = "%s/eccs2_%s.log" % (ECCS2OUTPUTDIR,DAY) date = DAY @@ -263,5 +263,5 @@ if __name__ == '__main__': # Useful only for API development Server #app.config['JSON_AS_ASCII'] = True #app.logger.removeHandler(default_handler) - #app.logger = getLogger("eccs2api.log", ECCS2LOGSDIR, "w", "INFO") + #app.logger = get_logger("eccs2api.log", ECCS2LOGSDIR, "w", "INFO") app.run(port='5002') diff --git a/cleanAndRunEccs2.sh b/cleanAndRunEccs2.sh index 58b981c7b8eb572e425b9d7910565eecb3e7820f..dbc494d5edc1dc5bc8518f5a21d32bc702188a37 100755 --- a/cleanAndRunEccs2.sh +++ b/cleanAndRunEccs2.sh @@ -11,61 +11,3 @@ rm -f $BASEDIR/eccs2/input/*.json # Run ECCS2 $BASEDIR/eccs2/runEccs2.py - -# Run Failed Command again -bash $BASEDIR/eccs2/logs/failed-cmd.sh - -date=$(date '+%Y-%m-%d') -file="$BASEDIR/eccs2/logs/failed-cmd.sh" -prefix="$BASEDIR/eccs2/eccs2.py '" -suffix="'" -eccs2output="$BASEDIR/eccs2/output/eccs2_$date.log" -declare -a eccs2cmdToRemoveArray - -# If the ECCS2 output contains the result of a failed command (failed-cmd.sh), -# than remove the failed command from the failed-cmd.sh file -if [ -s $eccsoutput ]; then - if [ -s $file ]; then - while IFS= read -r line - do - string=$line - - #remove "prefix" from the command string at the beginning. - prefix_removed_string=${string/#$prefix} - - #remove "suffix" from the command string at the end. - suffix_removed_string=${prefix_removed_string/%$suffix} - - entityIDidp=$(echo "$suffix_removed_string" | jq '.entityID') - - #remove start and end quotes from the entityIDidp to be able to use "grep" - entityIDidp="${entityIDidp:1}" - entityIDidp="${entityIDidp%?}" - - result=$(grep $entityIDidp $eccs2output | wc -l) - - if [[ "$result" = 1 ]]; then - eccs2cmdToRemoveArray+=("$entityIDidp") - else - echo "The result for the IdP '$entityIDidp' has been found multiple times on $eccs2output. It is wrong." - fi - - done <"$file" - - # Remove IdP command that had success from "failed-cmd.sh" - for idpToRemove in ${eccs2cmdToRemoveArray[@]} - do - $(grep -v $idpToRemove $file > temp ; mv -f temp $file) - done - - if [ -s $file ]; then - echo "$date - ECCS2 NOT OK: Some eduGAIN IdPs have remained unchecked. See the 'logs/failed-cmd.sh' and logs/stderr_$date.log files" - else - echo "$date - ECCS2 OK: All eduGAIN IdPs have been checked successfully" - fi - else - echo "$date - ECCS2 OK: All eduGAIN IdPs have been checked successfully" - fi -else - echo "$date - Something went wrong and the ECCS2 check has not been executed" -fi diff --git a/eccs2.py b/eccs2.py index 02c76d1da8bcd4efd63a630bb864bfd1203fc3c8..0777b840ee666be95f67af11c1e929a34db71345 100755 --- a/eccs2.py +++ b/eccs2.py @@ -1,212 +1,20 @@ #!/usr/bin/env python3 import argparse -import datetime import json -import re -import requests import sys +import utils +import eccs2properties as e2p -from eccs2properties import DAY, ECCS2HTMLDIR, ECCS2OUTPUTDIR, ECCS2RESULTSLOG, ECCS2SPS, ECCS2SELENIUMDEBUG,ROBOTS_USER_AGENT,ECCS2REQUESTSTIMEOUT, FEDS_DISABLED_DICT, IDPS_DISABLED_DICT, ECCS2SELENIUMPAGELOADTIMEOUT from pathlib import Path -from selenium.common.exceptions import TimeoutException -from urllib3.util import parse_url -from utils import getLogger, getIdPContacts, getDriver - """ The check works with the wayfless url of two SP and successed if the IdP Login Page appears and contains the fields "username" and "password" for each of them. -It is possible to disable the check by eccs2properties with *denylist or by "robots.txt" put on the SAMLRequest endpoint root web dir. +It is possible to disable the check by eccs2properties IDP_DISABLE_DICT or by "robots.txt" put on the SAMLRequest endpoint root web dir. """ -# Returns the FQDN to use on the HTML page_source files -def getIDPlabel(url_or_urn): - if url_or_urn.startswith('http'): - return parse_url(url_or_urn)[2] - else: - return url_or_urn.split(":")[-1] - -def getIDPfqdn(samlrequest_url): - return getIDPlabel(samlrequest_url) - -# This function checks if an IdP recognized the SP by presenting its Login page with "username" and "password" fields. -# It is possible to disable the check on eccs2properties with the *denylist or by "robots.txt" file into the SAMLRequest endpoint root web dir. -# If the IdP Login page contains "username" and "password" fields the test is passed. -def checkIdP(sp,idp,test): - - # Disable SSL requests warning messages - requests.packages.urllib3.disable_warnings() - - debug_selenium = ECCS2SELENIUMDEBUG - label_idp = getIDPlabel(idp['entityID']) - # WebDriver MUST be instanced here to avoid problems with SESSION - driver = getDriver(label_idp,debug_selenium) - - # Exception of WebDriver raises - if (driver == None): - return None - - # Configure Blacklists - federations_disabled_dict = FEDS_DISABLED_DICT - idps_disabled_dict = IDPS_DISABLED_DICT - - fqdn_sp = parse_url(sp)[2] - wayfless_url = sp + idp['entityID'] - - robots = "" - - if (idp['registrationAuthority'] in federations_disabled_dict.keys()): - check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' - - if (test is not True): - with open("%s/%s/%s---%s.html" % (ECCS2HTMLDIR,DAY,label_idp,fqdn_sp),"w") as html: - html.write("%s" % federations_disabled_dict[idp['registrationAuthority']]) - else: - print("%s" % federations_disabled_dict[idp['registrationAuthority']]) - - return (idp['entityID'],wayfless_url,check_time,"NULL","DISABLED") - - if (idp['entityID'] in idps_disabled_dict.keys()): - check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' - - if (test is not True): - with open("%s/%s/%s---%s.html" % (ECCS2HTMLDIR,DAY,label_idp,fqdn_sp),"w") as html: - html.write("%s" % idps_disabled_dict[idp['entityID']]) - else: - print("%s" % idps_disabled_dict[idp['entityID']]) - - return (idp['entityID'],wayfless_url,check_time,"NULL","DISABLED") - - # Open SP via wayfless_url and reach the IdP login page to check - try: - check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' - driver.get(wayfless_url) - page_source = driver.page_source - samlrequest_url = driver.current_url - - if (test is not True): - # Put the page_source into an appropriate HTML file - with open("%s/%s/%s---%s.html" % (ECCS2HTMLDIR,DAY,label_idp,fqdn_sp),"w") as html: - html.write(page_source) - else: - print("\n[page_source of '%s' for sp '%s']\n%s" % (label_idp,fqdn_sp,page_source)) - - except TimeoutException as e: - if (test is not True): - # Put an empty string into the page_source file - with open("%s/%s/%s---%s.html" % (ECCS2HTMLDIR,DAY,label_idp,fqdn_sp),"w") as html: - html.write("<html><h1>The IdP Login page was not loaded within %d seconds.</h1></html>" % ECCS2SELENIUMPAGELOADTIMEOUT ) - else: - print("\n[page_source of '%s' for sp '%s']\nNo source code" % (label_idp,fqdn_sp)) - return (idp['entityID'],wayfless_url,check_time,"(failed)","Timeout") - - except Exception as e: - print ("!!! EXCEPTION DRIVER !!!") - print (e.__str__()) - print ("IdP: %s\nSP: %s" % (idp['entityID'],sp)) - return None - - finally: - driver.quit() - - try: - headers = { - 'User-Agent': '%s' % ROBOTS_USER_AGENT - } - - fqdn_idp = getIDPfqdn(samlrequest_url) - - robots = requests.get("https://%s/robots.txt" % fqdn_idp, headers=headers, verify=True, timeout=ECCS2REQUESTSTIMEOUT) - - if (robots == ""): - robots = requests.get("http://%s/robots.txt" % fqdn_idp, headers=headers, verify=False, timeout=ECCS2REQUESTSTIMEOUT) - - # Catch SSL Exceptions and block the ECCS check - except (requests.exceptions.SSLError) as e: - check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' - - if (test is not True): - with open("%s/%s/%s---%s.html" % (ECCS2HTMLDIR,DAY,label_idp,fqdn_sp),"w") as html: - html.write("<p>IdP excluded from check due the following SSL Error:<br/><br/>%s</p><p>Check it on SSL Labs: <a href='https://www.ssllabs.com/ssltest/analyze.html?d=%s'>Click Here</a></p>" % (e.__str__(),fqdn_idp)) - else: - print("IdP excluded from check due the following SSL Error:\n\n%s\n\nCheck it on SSL Labs: https://www.ssllabs.com/ssltest/analyze.html?d=%s" % (e.__str__(),fqdn_idp)) - - return (idp['entityID'],wayfless_url,check_time,"(failed)","SSL-Error") - - # Pass every other exceptions on /robots.txt file. Consider only SSL Exceptions. - except Exception as e: - #print("IdP '%s' HAD HAD A REQUEST ERROR: %s" % (fqdn_idp,e.__str__())) - robots = "" - - if (robots): - check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' - - p = re.compile('^User-agent:\sECCS\sDisallow:\s\/\s*$', re.MULTILINE) - m = p.search(robots.text) - - if (m): - if (test is not True): - with open("%s/%s/%s---%s.html" % (ECCS2HTMLDIR,DAY,label_idp,fqdn_sp),"w") as html: - html.write("IdP excluded from check by robots.txt") - else: - print("IdP excluded from check by robots.txt") - - return (idp['entityID'],wayfless_url,check_time,"NULL","DISABLED") - - - pattern_metadata = "Unable.to.locate(\sissuer.in|).metadata(\sfor|)|no.metadata.found|profile.is.not.configured.for.relying.party|Cannot.locate.entity|fail.to.load.unknown.provider|does.not.recognise.the.service|unable.to.load.provider|Nous.n'avons.pas.pu.(charg|charger).le.fournisseur.de service|Metadata.not.found|application.you.have.accessed.is.not.registered.for.use.with.this.service|Message.did.not.meet.security.requirements" - - pattern_username = '<input[\s]+[^>]*((type=\s*[\'"](text|email)[\'"]|user)|(name=\s*[\'"](name)[\'"]))[^>]*>'; - pattern_password = '<input[\s]+[^>]*(type=\s*[\'"]password[\'"]|password)[^>]*>'; - - metadata_not_found = re.search(pattern_metadata,page_source, re.I) - username_found = re.search(pattern_username,page_source, re.I) - password_found = re.search(pattern_password,page_source, re.I) - - try: - headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36'} - http_code = str(requests.get(samlrequest_url, headers=headers, verify=False, timeout=ECCS2REQUESTSTIMEOUT).status_code) - - except requests.exceptions.ConnectionError as e: - print ("http-code: (failed) - ConnectionError for IdP '%s' with SP '%s'" % (idp['entityID'],sp)) - #print("!!! REQUESTS HTTP CODE CONNECTION ERROR EXCEPTION !!!") - #print (e.__str__()) - http_code = "(failed)" - - except requests.exceptions.Timeout as e: - print ("http-code: (failed) - TimeoutError for IdP '%s' with SP '%s'" % (idp['entityID'],sp)) - #print("!!! REQUESTS HTTP CODE TIMEOUT EXCEPTION !!!") - #print (e.__str__()) - http_code = "(failed)" - - except requests.exceptions.TooManyRedirects as e: - print ("http-code: (failed) - TooManyRedirectsError for IdP '%s' with SP '%s'" % (idp['entityID'],sp)) - #print("!!! REQUESTS HTTP CODE TOO MANY REDIRECTS EXCEPTION !!!") - #print (e.__str__()) - http_code = "(failed)" - - except requests.exceptions.RequestException as e: - print ("http-code: (failed) - RequestException for IdP '%s' with SP '%s'" % (idp['entityID'],sp)) - #print ("!!! REQUESTS EXCEPTION !!!") - print (e.__str__()) - http_code = "(failed)" - - except Exception as e: - print ("http-code: (failed) - OtherException for IdP '%s' with SP '%s'" % (idp['entityID'],sp)) - #print ("!!! EXCEPTION REQUESTS !!!") - print (e.__str__()) - http_code = "(failed)" - - if(metadata_not_found): - return (idp['entityID'],wayfless_url,check_time,http_code,"No-eduGAIN-Metadata") - elif not username_found or not password_found: - return (idp['entityID'],wayfless_url,check_time,http_code,"Invalid-Form") - else: - return (idp['entityID'],wayfless_url,check_time,http_code,"OK") - - # Extract IdP DisplayName by fixing input string -def getDisplayName(display_name): +def get_display_name(display_name): display_name_equal_splitted = display_name.split('==') for elem in display_name_equal_splitted: if "en" in elem: @@ -216,39 +24,40 @@ def getDisplayName(display_name): elem = elem.replace('"','\\"') return elem.split(';', 1)[1] - # Append the result of the check on a file -def storeECCS2result(idp,check_results,idp_status,test): +def store_eccs_result(idp,sp,check_results,idp_status,test): # Build the contacts lists: technical/support - list_technical_contacts = getIdPContacts(idp,'technical') - list_support_contacts = getIdPContacts(idp,'support') + list_technical_contacts = utils.get_idp_contacts(idp,'technical') + list_support_contacts = utils.get_idp_contacts(idp,'support') str_technical_contacts = ','.join(list_technical_contacts) str_support_contacts = ','.join(list_support_contacts) - if (test is not True): - # IdP-DisplayName;IdP-entityID;IdP-RegAuth;IdP-tech-ctc-1,IdP-tech-ctc-2;IdP-supp-ctc-1,IdP-supp-ctc-2;IdP-ECCS-Status;SP-wayfless-url-1;SP-check-time-1;SP-http-code-1;SP-result-1;SP-wayfless-url-2;SP-check-time-2;SP-http-code-2;SP-result-2 - with open("%s/%s" % (ECCS2OUTPUTDIR,ECCS2RESULTSLOG), 'a') as f: - f.write('{"displayName":"%s","entityID":"%s","registrationAuthority":"%s","contacts":{"technical":"%s","support":"%s"},"status":"%s","sp1":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"},"sp2":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"}}\n' % ( - getDisplayName(idp['displayname']), # IdP-DisplayName - idp['entityID'], # IdP-entityID - idp['registrationAuthority'], # IdP-RegAuth - str_technical_contacts, # IdP-TechCtcsList - str_support_contacts, # IdP-SuppCtcsList - idp_status, # IdP-ECCS-Status - check_results[0][1], # SP-wayfless-url-1 - check_results[0][2], # SP-check-time-1 - check_results[0][3], # SP-http-code-1 - check_results[0][4], # SP-check-result-1 - check_results[1][1], # SP-wayfless-url-2 - check_results[1][2], # SP-check-time-2 - check_results[1][3], # SP-http-code-2 - check_results[1][4])) # SP-check-result-2 + if (test): + sys.stdout.write("\nECCS2:") + sys.stdout.write('{"displayName":"%s","entityID":"%s","registrationAuthority":"%s","contacts":{"technical":"%s","support":"%s"},"status":"%s","sp1":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"},"sp2":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"}}\n' % ( + get_display_name(idp['displayname']), # IdP-DisplayName + idp['entityID'], # IdP-entityID + idp['registrationAuthority'], # IdP-RegAuth + str_technical_contacts, # IdP-TechCtcsList + str_support_contacts, # IdP-SuppCtcsList + idp_status, # IdP-ECCS-Status + check_results[0][1], # SP-wayfless-url-1 + check_results[0][2], # SP-check-time-1 + check_results[0][3], # SP-http-code-1 + check_results[0][4], # SP-check-result-1 + check_results[1][1], # SP-wayfless-url-2 + check_results[1][2], # SP-check-time-2 + check_results[1][3], # SP-http-code-2 + check_results[1][4])) # SP-check-result-2 + else: - print("\nECCS2:") - print('{"displayName":"%s","entityID":"%s","registrationAuthority":"%s","contacts":{"technical":"%s","support":"%s"},"status":"%s","sp1":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"},"sp2":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"}}\n' % ( - getDisplayName(idp['displayname']), # IdP-DisplayName + # IdP-DisplayName;IdP-entityID;IdP-RegAuth;IdP-tech-ctc-1,IdP-tech-ctc-2;IdP-supp-ctc-1,IdP-supp-ctc-2;IdP-ECCS-Status;SP-wayfless-url-1;SP-check-time-1;SP-http-code-1;SP-result-1;SP-wayfless-url-2;SP-check-time-2;SP-http-code-2;SP-result-2 + with open(f"{e2p.ECCS2OUTPUTDIR}/{e2p.ECCS2RESULTSLOG}", 'a') as f: + try: + f.write('{"displayName":"%s","entityID":"%s","registrationAuthority":"%s","contacts":{"technical":"%s","support":"%s"},"status":"%s","sp1":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"},"sp2":{"wayflessUrl":"%s","checkTime":"%s","httpCode":"%s","checkResult":"%s"}}\n' % ( + get_display_name(idp['displayname']), # IdP-DisplayName idp['entityID'], # IdP-entityID idp['registrationAuthority'], # IdP-RegAuth str_technical_contacts, # IdP-TechCtcsList @@ -261,45 +70,55 @@ def storeECCS2result(idp,check_results,idp_status,test): check_results[1][1], # SP-wayfless-url-2 check_results[1][2], # SP-check-time-2 check_results[1][3], # SP-http-code-2 - check_results[1][4])) # SP-check-result-2 - + check_results[1][4] # SP-check-result-2 + ) + ) + except IOError: + sys.stderr.write(f"Failed writing result on output file for {idp['entityID']} with {utils.get_label(sp)}.\n\nRun {e2p.ECCS2DIR}/runEccs2.py --idp {idp['entityID']} --replace\n") + sys.exit(1) # Check an IdP with 2 SPs. -def check(idp,sps,test): +def check(idp,test): check_results = [] - for sp in sps: - result = checkIdP(sp,idp,test) - if result is not None: + for sp in e2p.ECCS2SPS: + result = utils.check_idp_response_selenium(sp,idp,test) + if (result): check_results.append(result) - - if len(check_results) == 2: + else: + sys.stderr.write(f"\nCheck failed for {idp['entityID']} with {utils.get_label(sp)}.\n\nRun {e2p.ECCS2DIR}/runEccs2.py --idp {idp['entityID']} --replace\n") + sys.exit(1) + + if (len(check_results) == len(e2p.ECCS2SPS)): check_result_sp1 = check_results[0][4] check_result_sp2 = check_results[1][4] + check_result_weberr1 = check_results[0][5] + check_result_weberr2 = check_results[1][5] # If all checks are 'OK', than the IdP consuming correctly eduGAIN Metadata. if (check_result_sp1 == check_result_sp2 == "OK"): - storeECCS2result(idp,check_results,'OK',test) + store_eccs_result(idp,sp,check_results,'OK',test) elif (check_result_sp1 == check_result_sp2 == "DISABLED"): - storeECCS2result(idp,check_results,'DISABLED',test) + store_eccs_result(idp,sp,check_results,'DISABLED',test) else: - storeECCS2result(idp,check_results,'ERROR',test) - + store_eccs_result(idp,sp,check_results,'ERROR',test) # MAIN if __name__=="__main__": - sps = ECCS2SPS - parser = argparse.ArgumentParser(description='Checks if the input IdP consumed correctly eduGAIN metadata by accessing two different SPs') parser.add_argument("idpJson", metavar="idpJson", nargs=1, help="An IdP in Json format") parser.add_argument("--test", action='store_true', help="Test the IdP without effects") + parser.add_argument("--replace", action='store_true', help="Check an IdP and replace the result") args = parser.parse_args() idp = json.loads(args.idpJson[0]) - Path("%s/%s" % (ECCS2HTMLDIR,DAY)).mkdir(parents=True, exist_ok=True) # Create dir needed to page_source content + Path(f"{e2p.ECCS2HTMLDIR}/{e2p.DAY}").mkdir(parents=True, exist_ok=True) # Create dir needed to page_source content + + if (args.replace and not args.test): + utils.delete_line_with_word(f"{e2p.ECCS2OUTPUTDIR}/{e2p.ECCS2RESULTSLOG}",idp['entityID']) - check(idp,sps,args.test) + check(idp,args.test) diff --git a/eccs2properties.py b/eccs2properties.py index f85f8fb368d2d2786a055fdc26188122143a3e66..3dd995a626433645183b8d51690f6b71c27a9787 100644 --- a/eccs2properties.py +++ b/eccs2properties.py @@ -22,9 +22,9 @@ ECCS2HTMLDIR = "%s/html" % ECCS2DIR # Selenium ECCS2SELENIUMDEBUG = False ECCS2SELENIUMLOGDIR = "%s/selenium-logs" % ECCS2DIR -ECCS2SELENIUMPAGELOADTIMEOUT = 60 #seconds -ECCS2SELENIUMSCRIPTTIMEOUT = 60 #seconds -ECCS2REQUESTSTIMEOUT = 60 #seconds +ECCS2SELENIUMPAGELOADTIMEOUT = 30 #seconds +ECCS2SELENIUMSCRIPTTIMEOUT = 30 #seconds +ECCS2REQUESTSTIMEOUT = 15 #seconds # Logs ECCS2LOGSDIR = "%s/logs" % ECCS2DIR @@ -36,14 +36,22 @@ ECCS2STDERRIDP = "%s/stderr_idp_%s.log" % (ECCS2LOGSDIR,DAY) ECCS2FAILEDCMDIDP = "%s/failed-cmd-idp.sh" % ECCS2LOGSDIR # Number of processes to run in parallel -ECCS2NUMPROCESSES = 10 +ECCS2NUMPROCESSES = 35 # The 2 SPs that will be used to test each IdP -#ECCS2SPS = ["https://sp24-test.garr.it/Shibboleth.sso/Login?entityID=", "https://attribute-viewer.aai.switch.ch/Shibboleth.sso/Login?entityID="] -ECCS2SPS = ["https://sp-demo.idem.garr.it/Shibboleth.sso/Login?entityID=", "https://attribute-viewer.aai.switch.ch/Shibboleth.sso/Login?entityID="] +ECCS2SPS = [ + "https://sp-demo.idem.garr.it/Shibboleth.sso/Login?entityID=", + "https://attribute-viewer.aai.switch.ch/interfederation-test/Shibboleth.sso/Login?entityID=" +] # ROBOTS.TXT -ROBOTS_USER_AGENT = "ECCS/2.0 (+https://technical.edugain.org/eccs2)" +ROBOTS_USER_AGENT = "ECCS/2.0 (+https://technical-test.edugain.org/eccs2)" + +# PATTERNS +METADATAPATTERN = "Unable.to.locate(\sissuer.in|).metadata(\sfor|)|no.metadata.found|profile.is.not.configured.for.relying.party|Cannot.locate.entity|fail.to.load.unknown.provider|does.not.recognise.the.service|unable.to.load.provider|Nous.n'avons.pas.pu.(charg|charger).le.fournisseur.de service|Metadata.not.found|application.you.have.accessed.is.not.registered.for.use.with.this.service|Message.did.not.meet.security.requirements|Unsupported.Request|Not.Authorized" +USERNAMEPATTERN = '<input[\s]+[^>]*((type=\s*[\'"](text|email)[\'"]|user)|(name=\s*[\'"](name)[\'"]))[^>]*>' +PASSWORDPATTERN = '<input[\s]+[^>]*(type=\s*[\'"]password[\'"]|password)[^>]*>' +REFUSEDPATTERN = '(^http)(.*\.png$)|(.*\.css$)|(.*\.js$)|(.*\.gif$)|(.*\.svg$)|(.*\.jpg$)' # { 'reg_auth':'reason' } FEDS_DISABLED_DICT = { @@ -74,5 +82,6 @@ IDPS_DISABLED_DICT = { 'https://sso.vu.lt/SSO/saml2/idp/metadata.php':'Disabled on 2018-11-02 because ECCS2 cannot check non-standard login page', #'https://ssl.education.lu/saml/saml2/idp/metadata.php':'Disabled on 2018-11-06 ECCS2 cannot check non-standard login page', 'https://iif.iucc.ac.il/idp/saml2/idp/metadata.php':'Disabled on 2018-11-06 ECCS2 cannot check non-standard login page', - 'https://my.atsu.edu/swPublicSSO/SAML/incommon':'Disabled on 2020-11-17 because ECCS2 cannot check non-standard login page' + 'https://my.atsu.edu/swPublicSSO/SAML/incommon':'Disabled on 2020-11-17 because ECCS2 cannot check non-standard login page', + 'https://edugain-proxy.igtf.net/simplesaml/saml2/idp/metadata.php':'Disabled on 2017-03-17 on request of federation operator' } diff --git a/requirements.txt b/requirements.txt index 1bb6041a579846552a98ff32e3a146313e4b9739..8c7fe6cd154a276cc0d3373a9431abad212de84c 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,5 +1,7 @@ -urllib3==1.25.9 +urllib3==1.26.6 Flask==1.1.2 -Flask_RESTful==0.3.8 -requests==2.23.0 +Flask-RESTful==0.3.9 +requests==2.25.1 selenium==3.141.0 +uWSGI==2.0.19.1 +Click==7.1.2 diff --git a/runEccs2.py b/runEccs2.py index 7b8917c73d4f38e11afd6a2c8fcbe33fb9391db9..20cf79838d03d55b973dc22683ed42bc13dc222b 100755 --- a/runEccs2.py +++ b/runEccs2.py @@ -3,19 +3,23 @@ import argparse import asyncio import datetime -import eccs2properties import json import time -from utils import getListFeds, getListEccsIdps, getRegAuthDict, getIdpList -from eccs2properties import ECCS2FAILEDCMD, ECCS2FAILEDCMDIDP, ECCS2STDOUT, ECCS2STDERR, ECCS2STDOUTIDP, ECCS2STDERRIDP, ECCS2DIR, ECCS2NUMPROCESSES, ECCS2LISTIDPSURL, ECCS2LISTIDPSFILE, ECCS2LISTFEDSURL, ECCS2LISTFEDSFILE +import eccs2properties as e2p +import utils + from subprocess import PIPE +#from utils import get_list_feds, get_list_eccs_idps, get_reg_auth_dict, get_idp_list, gen_output +#from eccs2properties import ECCS2FAILEDCMD, ECCS2FAILEDCMDIDP, ECCS2STDOUT, ECCS2STDERR, ECCS2STDOUTIDP, ECCS2STDERRIDP, ECCS2DIR, ECCS2NUMPROCESSES, ECCS2LISTIDPSURL, ECCS2LISTIDPSFILE, ECCS2LISTFEDSURL, ECCS2LISTFEDSFILE, ECCS2OUTPUTDIR, ECCS2RESULTSLOG, ECCS2AUXDIR # Run Command +# https://docs.python.org/3/library/asyncio-queue.html#examples +# https://docs.python.org/3/library/asyncio-subprocess.html async def run(name,queue,stdout_file,stderr_file,cmd_file): while True: - # Get a "cmd item" out of the queue. + # Get a "cmd item" out of the queue or wait for the next one cmd = await queue.get() # Elaborate "cmd" from shell. @@ -27,11 +31,14 @@ async def run(name,queue,stdout_file,stderr_file,cmd_file): stdout, stderr = await proc.communicate() + print(f'[{name} exited with {proc.returncode}] - {cmd!r}') + if stdout: - stdout_file.write('-----\n[cmd-out]\n%s\n\n[stdout]\n%s' % (cmd,stdout.decode())) + stdout_file.write(f'-----\n[cmd]\n{cmd}\n\n[stdout]\n{stdout.decode()}') + # If an error occurred, the failed command is put into the 'cmd_file' (failed-cmd.sh / failed-cmd-idp.sh) if stderr: - stderr_file.write('-----\n[cmd-err]\n%s\n\n[stderr]\n%s' % (cmd,stderr.decode())) - cmd_file.write('%s\n' % cmd) + stderr_file.write(f'-----\n[cmd]\n{cmd}\n\n[stderr]\n{stderr.decode()}') + cmd_file.write(f'{cmd}\n') # Notify the queue that the "work cmd" has been processed. queue.task_done() @@ -48,14 +55,12 @@ async def main(cmd_list,stdout_file,stderr_file,cmd_file): # Create worker tasks to process the queue concurrently. tasks = [] - for i in range(ECCS2NUMPROCESSES): - task = asyncio.create_task(run("cmd-{%d}" % i, queue, stdout_file, stderr_file, cmd_file)) + for i in range(e2p.ECCS2NUMPROCESSES): + task = asyncio.create_task(run(f"cmd-{i}", queue, stdout_file, stderr_file, cmd_file)) tasks.append(task) # Wait until the queue is fully processed. - started_at = time.monotonic() await queue.join() - total_slept_for = time.monotonic() - started_at # Cancel our worker tasks. for task in tasks: @@ -72,60 +77,79 @@ if __name__=="__main__": parser.add_argument("--idp", metavar="entityid", dest="idp_entityid", nargs=1, help="An IdP entityID") parser.add_argument("--test", action='store_true', dest="test", help="Test without effects") + parser.add_argument("--replace", action='store_true', help="Check an IdP and replace the result") args = parser.parse_args() start = time.time() # Setup list_feds - url = ECCS2LISTFEDSURL - dest_file = ECCS2LISTFEDSFILE - list_feds = getListFeds(url, dest_file) + url = e2p.ECCS2LISTFEDSURL + dest_file = e2p.ECCS2LISTFEDSFILE + list_feds = utils.get_list_feds(url, dest_file) # Setup list_eccs_idps - url = ECCS2LISTIDPSURL - dest_file = ECCS2LISTIDPSFILE - list_eccs_idps = getListEccsIdps(url, dest_file) + url = e2p.ECCS2LISTIDPSURL + dest_file = e2p.ECCS2LISTIDPSFILE + list_eccs_idps = utils.get_list_eccs_idps(url, dest_file) if (args.idp_entityid): - stdout_file = open(ECCS2STDOUTIDP,"w+") - stderr_file = open(ECCS2STDERRIDP,"w+") - cmd_file = open(ECCS2FAILEDCMDIDP,"w+") - idpJsonList = getIdpList(list_eccs_idps,idp_entityid=args.idp_entityid[0]) + stdout_file = open(e2p.ECCS2STDOUTIDP,"w+") + stderr_file = open(e2p.ECCS2STDERRIDP,"w+") + cmd_file = open(e2p.ECCS2FAILEDCMDIDP,"w+") + idpJsonList = utils.get_idp_list(list_eccs_idps,idp_entityid=args.idp_entityid[0]) - if (args.test is not True): - cmd = "%s/eccs2.py \'%s\'" % (ECCS2DIR,json.dumps(idpJsonList[0])) - else: - cmd = "%s/eccs2.py \'%s\' --test" % (ECCS2DIR,json.dumps(idpJsonList[0])) + if (args.test): + cmd = f"{e2p.ECCS2DIR}/eccs2.py '{json.dumps(idpJsonList[0])}' --test" + elif (args.replace): + cmd = f"{e2p.ECCS2DIR}/eccs2.py '{json.dumps(idpJsonList[0])}' --replace" + # List of only one command proc_list = [cmd] + + # Run Command asyncio.run(main(proc_list,stdout_file,stderr_file,cmd_file)) + # Close File + stdout_file.close() + stderr_file.close() + cmd_file.close() + else: - stdout_file = open(ECCS2STDOUT,"w+") - stderr_file = open(ECCS2STDERR,"w+") - cmd_file = open(ECCS2FAILEDCMD,"w+") + stdout_file = open(e2p.ECCS2STDOUT,"w+") + stderr_file = open(e2p.ECCS2STDERR,"w+") + cmd_file = open(e2p.ECCS2FAILEDCMD,"w+") # Prepare input file for ECCS2 - regAuthDict = getRegAuthDict(list_feds) - - for name,regAuth in regAuthDict.items(): - idpJsonList = getIdpList(list_eccs_idps,regAuth) - - num_idps = len(idpJsonList) - if (args.test is not True): - cmd_list = [["%s/eccs2.py \'%s\'" % (ECCS2DIR, json.dumps(idp))] for idp in idpJsonList] - else: - cmd_list = [["%s/eccs2.py \'%s\' --test" % (ECCS2DIR, json.dumps(idp))] for idp in idpJsonList] - - proc_list = [] - count = 0 - while (count < num_idps): - cmd = "".join(cmd_list.pop()) - proc_list.append(cmd) - count = count + 1 + regAuthDict = utils.get_reg_auth_dict(list_feds) + + #for name,regAuth in regAuthDict.items(): + # Load the idps belonging to a Federation into idpJsonList + idpJsonList = utils.get_idp_list(list_eccs_idps) + num_idps = len(idpJsonList) + + # Construct the list of commands to exec + if (args.test): + cmd_list = [[f"{e2p.ECCS2DIR}/eccs2.py '{json.dumps(idp)}' --test"] for idp in idpJsonList] + elif (args.replace): + cmd_list = [[f"{e2p.ECCS2DIR}/eccs2.py '{json.dumps(idp)}' --replace"] for idp in idpJsonList] + else: + cmd_list = [[f"{e2p.ECCS2DIR}/eccs2.py '{json.dumps(idp)}'"] for idp in idpJsonList] + + # String Convertion needed for Asyncio + proc_list = [] + count = 0 + while (count < num_idps): + cmd = "".join(cmd_list.pop()) + proc_list.append(cmd) + count = count + 1 - asyncio.run(main(proc_list,stdout_file,stderr_file,cmd_file)) + asyncio.run(main(proc_list,stdout_file,stderr_file,cmd_file)) + + stdout_file.close() + stderr_file.close() + cmd_file.close() end = time.time() + #utils.gen_output(e2p.ECCS2AUXDIR,f"{e2p.ECCS2OUTPUTDIR}/{e2p.ECCS2RESULTSLOG}") print("Time taken in hh:mm:ss - ", str(datetime.timedelta(seconds=end - start))) diff --git a/utils.py b/utils.py index 6f95b3900a73c4777594e3a2b36ef4a43f821ea2..e433d3180a00ed78bf7def360ecb67e90cfcec63 100644 --- a/utils.py +++ b/utils.py @@ -1,18 +1,42 @@ #!/usr/bin/env python3 +import datetime import json import logging import pathlib +import re import requests import sys +import shutil +import time + +import eccs2properties as e2p -from eccs2properties import ECCS2SELENIUMLOGDIR, ECCS2SELENIUMPAGELOADTIMEOUT, ECCS2SELENIUMSCRIPTTIMEOUT, PATHCHROMEDRIVER from selenium import webdriver -from selenium.common.exceptions import WebDriverException +from selenium.common.exceptions import WebDriverException,TimeoutException +from selenium.webdriver.common.by import By +from selenium.webdriver.support.ui import WebDriverWait +from selenium.webdriver.support import expected_conditions as EC +from selenium.webdriver.chrome.options import Options +from logging.handlers import RotatingFileHandler +from urllib3.util import parse_url + +def sha1(idp_entity_id): + import hashlib + result = hashlib.sha1(idp_entity_id.encode()) + return result.hexdigest() + + +# Return a label useful for a filename +def get_label(url_or_urn): + if url_or_urn.startswith('http'): + return parse_url(url_or_urn)[2] + else: + return parse_url(url_or_urn)[4].lstrip('/') # Returns a Dict of "{ nameFed:reg_auth }" -def getRegAuthDict(list_feds): +def get_reg_auth_dict(list_feds): regAuth_dict = {} for key,value in list_feds.items(): @@ -25,7 +49,7 @@ def getRegAuthDict(list_feds): # Returns a list of IdP for a single federation -def getIdpList(list_eccs_idps,reg_auth=None,idp_entityid=None): +def get_idp_list(list_eccs_idps,reg_auth=None,idp_entityid=None): fed_idp_list = [] for idp in list_eccs_idps: if (idp_entityid): @@ -40,8 +64,10 @@ def getIdpList(list_eccs_idps,reg_auth=None,idp_entityid=None): return fed_idp_list -# Returns a Python Dictionary -def getListFeds(url, dest_file): +# Download all eduGAIN Federations from URL, store them on a local file and returns a Python Dictionary +def get_list_feds(url, dest_file): + from pathlib import Path + # If file does not exists... download it into the dest_file path = pathlib.Path(dest_file) if(path.exists() == False): @@ -54,7 +80,9 @@ def getListFeds(url, dest_file): # Download all eduGAIN IdPs from URL, store them on a local file and returns a Python List -def getListEccsIdps(url, dest_file): +def get_list_eccs_idps(url, dest_file): + from pathlib import Path + # If file does not exists... download it into the dest_file path = pathlib.Path(dest_file) if(path.exists() == False): @@ -67,10 +95,10 @@ def getListEccsIdps(url, dest_file): # Use logger to produce files consumed by ECCS-2 API -def getLogger(filename, path, mode, log_level="DEBUG"): +def get_logger(path, filename, mode="a", log_level="DEBUG"): - logger = logging.getLogger(filename) - ch = logging.FileHandler("%s/%s" % (path,filename), mode,'utf-8') + logger = logging.getLogger(__name__) + ch = logging.handlers.RotatingFileHandler(f"{path}/{filename}", mode, 0, 5, 'utf-8') if (log_level == "DEBUG"): logger.setLevel(logging.DEBUG) @@ -88,7 +116,7 @@ def getLogger(filename, path, mode, log_level="DEBUG"): logger.setLevel(logging.CRITICAL) ch.setLevel(logging.CRITICAL) - formatter = logging.Formatter('%(message)s') + formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(module)s - %(message)s', datefmt='%d/%m/%Y %H:%M:%S') ch.setFormatter(formatter) logger.addHandler(ch) @@ -96,7 +124,7 @@ def getLogger(filename, path, mode, log_level="DEBUG"): # Return a list of email address for a specific type of contact -def getIdPContacts(idp,contactType): +def get_idp_contacts(idp,contactType): ctcList = [] for ctcType in idp['contacts']: if (ctcType == contactType): @@ -110,34 +138,194 @@ def getIdPContacts(idp,contactType): ctcList.append('missing email') return ctcList +# Write the login page source code into its file +def store_page_source(page_source,idp,sp,test): + if (test): + sys.stdout.write(f"{page_source}") + return True + else: + # Put the page_source into an appropriate HTML file + with open(f"{e2p.ECCS2HTMLDIR}/{e2p.DAY}/{sha1(['entityID'])}---{get_label(sp)}.html","w") as html: + try: + html.write(page_source) + return True + except IOError: + return False -def getDriver(fqdn_idp=None,debugSelenium=False): - # Disable SSL requests warning messages - requests.packages.urllib3.disable_warnings() +# Get the Google Chrom Selenium Driver +def get_driver_selenium(idp=None,sp=None,debugSelenium=False): # Configure Web-driver - chrome_options = webdriver.ChromeOptions() + # https://peter.sh/experiments/chromium-command-line-switches/ + chrome_options = Options() + chrome_options.page_load_strategy = 'eager' + + chrome_options.add_argument('--start-in-incognito') chrome_options.add_argument('--headless') chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--disable-dev-shm-usage') - chrome_options.add_argument('--ignore-certificate-errors') - chrome_options.add_argument('--remote-debugging-port=9222') - #chrome_options.add_argument('--start-maximized') + chrome_options.add_argument('--disable-gpu') + chrome_options.add_argument('--disable-extensions') + chrome_options.add_argument('--disable-dinosaur-easter-egg') + chrome_options.add_argument('--disable-sync') # For DEBUG only (By default ChromeDriver logs only warnings/errors to stderr. # When debugging issues, it is helpful to enable more verbose logging.) + if (debugSelenium): + label_idp = get_label(idp['entityID']) + label_sp = get_label(sp) + sha1_idp = sha1(idp['entityID']) + try: + driver = webdriver.Chrome(e2p.PATHCHROMEDRIVER, options=chrome_options, service_args=['--verbose', f'--log-path={e2p.ECCS2SELENIUMLOGDIR}/{sha1_idp}_{label_idp}_{label_sp}.log']) + except WebDriverException: + time.sleep(3) + driver = webdriver.Chrome(e2p.PATHCHROMEDRIVER, options=chrome_options, service_args=['--verbose', f'--log-path={e2p.ECCS2SELENIUMLOGDIR}/{sha1_idp}_{label_idp}_{label_sp}.log']) + else: + try: + driver = webdriver.Chrome(e2p.PATHCHROMEDRIVER, options=chrome_options) + except WebDriverException: + time.sleep(3) + driver = webdriver.Chrome(e2p.PATHCHROMEDRIVER, options=chrome_options) + return driver + +def check_idp_response_selenium(sp,idp,test): + + # Disable SSL requests warning messages + #requests.packages.urllib3.disable_warnings() + + # Common variables + fqdn_idp = get_label(idp['Location']) + wayfless_url = f"{sp}{idp['entityID']}" + robots = "" + federations_disabled_dict = e2p.FEDS_DISABLED_DICT + idps_disabled_dict = e2p.IDPS_DISABLED_DICT + webdriver_error = 0 # No WebDriver Error + + # SELENIUM CONSTANTS + http_code = "NULL" + + # Handle Disabled Idps/Federations + if (idp['registrationAuthority'] in federations_disabled_dict.keys()): + check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' + page_source = federations_disabled_dict[idp['registrationAuthority']] + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"NULL","DISABLED",webdriver_error) + + if (idp['entityID'] in idps_disabled_dict.keys()): + check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' + page_source = idps_disabled_dict[idp['entityID']] + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"NULL","DISABLED",webdriver_error) + + # Robots + SSL Check try: - if (debugSelenium and fqdn_idp): - driver = webdriver.Chrome(PATHCHROMEDRIVER, options=chrome_options, service_args=['--verbose', '--log-path=%s/%s.log' % (ECCS2SELENIUMLOGDIR, fqdn_idp)]) + hdrs = { + 'User-Agent': f'{e2p.ROBOTS_USER_AGENT}' + } + check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' + robots = requests.get(f"https://{fqdn_idp}/robots.txt", headers=hdrs, verify=True, timeout=e2p.ECCS2REQUESTSTIMEOUT) + + if (robots == ""): + robots = requests.get(f"http://{fqdn_idp}/robots.txt", headers=hdrs, verify=False, timeout=e2p.ECCS2REQUESTSTIMEOUT) + + # Catch SSL Exceptions and block the ECCS check + except requests.exceptions.SSLError as e: + if (test): page_source = f"\nAn SSL Error occurred while opening https://{fqdn_idp}/robots.txt.\n\n{e}\n\nCheck it on SSL Labs: https://www.ssllabs.com/ssltest/analyze.html?d={fqdn_idp}" + else: page_source = f"<h1>SSL ERROR</h1><h2>An SSL error occurred for the server {fqdn_idp}:</h2><p>{e}</p><p>Check it on SSL Labs: <a href='https://www.ssllabs.com/ssltest/analyze.html?d={fqdn_idp}'>Click Here</a></p>" + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"(failed)","SSL-Error",webdriver_error) + + except requests.exceptions.ConnectionError as e: + if (test): page_source = f"\nA Connection error occurred while opening https://{fqdn_idp}/robots.txt.\n\n{e}" + else: page_source = f"<h1>CONNECTION ERROR:</h1><h2>A Connection error occurred while opening <a href='https://{fqdn_idp}/robots.txt'>https://{fqdn_idp}/robots.txt</a>:</h2><p>{e}</p>" + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"(failed)","Connection-Error",webdriver_error) + + except requests.exceptions.Timeout as e: + if (test): page_source = f"\nThe request timed out while trying to connect to the remote server '{fqdn_idp}':\n\n{e}" + else: page_source = f"<h1>CONNECTION TIMEOUT</h1><h2>The request timed out while opening <a href='https://{fqdn_idp}/robots.txt'>https://{fqdn_idp}/robots.txt</a>:</h2><p>{e}</p>" + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"(failed)","Connection-Error",webdriver_error) + + except requests.exceptions.TooManyRedirects as e: + if (test): page_source = f"\nToo many redirects occurred while opening: https://{fqdn_idp}/robots.txt.\n\n{e}" + else: page_source = f"<h1>TOO MANY REDIRECTS</h1><h2>Too many redirects occurred while opening: <a href='https://{fqdn_idp}/robots.txt'>https://{fqdn_idp}/robots.txt</a>:</h2><p>{e}</p>" + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"(failed)","Connection-Error",webdriver_error) + + if (robots): + check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' + p = re.compile('^User-Agent:\sECCS\sDisallow:\s\/\s*$', re.MULTILINE) + m = p.search(robots.text) + + if (m): + page_source = "IdP excluded from check by robots.txt" + store_page_source(page_source,idp,sp,test) + return (idp['entityID'],wayfless_url,check_time,"NULL","DISABLED",webdriver_error) + + try: + # WebDriver MUST be instanced here to avoid problems with SESSION + driver = get_driver_selenium(idp,sp,e2p.ECCS2SELENIUMDEBUG) + + # Exception of WebDriver raises + if (driver == None): + sys.stderr.write(f"get_driver_selenium() returned None for IDP {idp['entityID']}(SHA1: {sha1(idp['entityID'])}) with SP {get_label(sp)}") + return None + + driver.set_page_load_timeout(e2p.ECCS2SELENIUMPAGELOADTIMEOUT) + driver.set_script_timeout(e2p.ECCS2SELENIUMSCRIPTTIMEOUT) + + check_time = datetime.datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%S') + 'Z' + driver.get(wayfless_url) + + elem = WebDriverWait(driver, e2p.ECCS2SELENIUMPAGELOADTIMEOUT).until( + EC.presence_of_element_located((By.XPATH,'//form//input[@type="password"]')) + ) + page_source = driver.page_source + + if (test): pgsrc = f"\n[WAYFLESS URL]{wayfless_url} - OK" + else: pgsrc = page_source + stored = store_page_source(pgsrc,idp,sp,test) + if (stored): + return (idp['entityID'],wayfless_url,check_time,http_code,"OK",webdriver_error) + + except TimeoutException as e: + page_source = driver.page_source + metadata_not_found = re.search(e2p.METADATAPATTERN,page_source, re.I) + + if (metadata_not_found): + if (test): pgsrc = f"\n[PAGE_SOURCE]\n{page_source}\n[WAYFLESS URL]{wayfless_url} - METADATA NOT FOUND" + else: pgsrc = page_source + stored = store_page_source(pgsrc,idp,sp,test) + if (stored): + return (idp['entityID'],wayfless_url,check_time,http_code,"No-eduGAIN-Metadata",webdriver_error) else: - driver = webdriver.Chrome(PATHCHROMEDRIVER, options=chrome_options) + if (test): pgsrc = f"Timeout: No access form found in {e2p.ECCS2SELENIUMPAGELOADTIMEOUT} seconds" + else: pgsrc = page_source + stored = store_page_source(pgsrc,idp,sp,test) + if (stored): + return (idp['entityID'],wayfless_url,check_time,"(failed)","Timeout",webdriver_error) + except WebDriverException as e: - sys.stderr.write("!!! WEB DRIVER EXCEPTION - RUN AGAIN THE COMMAND!!!") - sys.stderr.write(e.__str__()) - return None + error = e.__dict__['msg'].split('(')[0].rstrip() + if (test): pgsrc = f"\nERROR: {error}" + else: pgsrc = f"<h1>ECCS2 CHECK FAILED</h1><h2>The IdP Login failed the check due the following error:</h2><p>{error}</p>" + webdriver_error = 1 + stored = store_page_source(pgsrc,idp,sp,test) + if (stored): + return (idp['entityID'],wayfless_url,check_time,"(failed)","ERROR",webdriver_error) - # Configure timeouts - driver.set_page_load_timeout("%d" % ECCS2SELENIUMPAGELOADTIMEOUT) - driver.set_script_timeout("%d" % ECCS2SELENIUMSCRIPTTIMEOUT) + finally: + driver.quit() - return driver +def delete_line_with_word(filepath,word): + import os.path + + if os.path.isfile(filepath): + with open(filepath, "r") as f: + lines = f.readlines() + + with open(filepath, "w") as f: + for line in lines: + if word not in line: + f.write(line) diff --git a/web/eccs2.js b/web/eccs2.js index d74b2a0d1221762df69c0555497c1094a42f1d9c..ee462de7091647ee97d535aa6e9d978a9452bd90 100644 --- a/web/eccs2.js +++ b/web/eccs2.js @@ -3,6 +3,145 @@ var table; var url = "/eccs2/api/eccsresults?eccsdt=1"; var infoCircle = '<a href="https://wiki.geant.org/display/eduGAIN/eduGAIN+Connectivity+Check+2#eduGAINConnectivityCheck2-Statusesandresults"><i class="fas fa-info-circle"></i></a>'; +/* + * Secure Hash Algorithm (SHA1) + * https://www.webtoolkit.info/javascript_sha1.html +*/ +function SHA1(msg) { + function rotate_left(n,s) { + var t4 = ( n<<s ) | (n>>>(32-s)); + return t4; + }; + function lsb_hex(val) { + var str=''; + var i; + var vh; + var vl; + for( i=0; i<=6; i+=2 ) { + vh = (val>>>(i*4+4))&0x0f; + vl = (val>>>(i*4))&0x0f; + str += vh.toString(16) + vl.toString(16); + } + return str; + }; + function cvt_hex(val) { + var str=''; + var i; + var v; + for( i=7; i>=0; i-- ) { + v = (val>>>(i*4))&0x0f; + str += v.toString(16); + } + return str; + }; + function Utf8Encode(string) { + string = string.replace(/\r\n/g,'\n'); + var utftext = ''; + for (var n = 0; n < string.length; n++) { + var c = string.charCodeAt(n); + if (c < 128) { + utftext += String.fromCharCode(c); + } + else if((c > 127) && (c < 2048)) { + utftext += String.fromCharCode((c >> 6) | 192); + utftext += String.fromCharCode((c & 63) | 128); + } + else { + utftext += String.fromCharCode((c >> 12) | 224); + utftext += String.fromCharCode(((c >> 6) & 63) | 128); + utftext += String.fromCharCode((c & 63) | 128); + } + } + return utftext; + }; + var blockstart; + var i, j; + var W = new Array(80); + var H0 = 0x67452301; + var H1 = 0xEFCDAB89; + var H2 = 0x98BADCFE; + var H3 = 0x10325476; + var H4 = 0xC3D2E1F0; + var A, B, C, D, E; + var temp; + msg = Utf8Encode(msg); + var msg_len = msg.length; + var word_array = new Array(); + for( i=0; i<msg_len-3; i+=4 ) { + j = msg.charCodeAt(i)<<24 | msg.charCodeAt(i+1)<<16 | + msg.charCodeAt(i+2)<<8 | msg.charCodeAt(i+3); + word_array.push( j ); + } + switch( msg_len % 4 ) { + case 0: + i = 0x080000000; + break; + case 1: + i = msg.charCodeAt(msg_len-1)<<24 | 0x0800000; + break; + case 2: + i = msg.charCodeAt(msg_len-2)<<24 | msg.charCodeAt(msg_len-1)<<16 | 0x08000; + break; + case 3: + i = msg.charCodeAt(msg_len-3)<<24 | msg.charCodeAt(msg_len-2)<<16 | msg.charCodeAt(msg_len-1)<<8 | 0x80; + break; + } + word_array.push( i ); + while( (word_array.length % 16) != 14 ) word_array.push( 0 ); + word_array.push( msg_len>>>29 ); + word_array.push( (msg_len<<3)&0x0ffffffff ); + for ( blockstart=0; blockstart<word_array.length; blockstart+=16 ) { + for( i=0; i<16; i++ ) W[i] = word_array[blockstart+i]; + for( i=16; i<=79; i++ ) W[i] = rotate_left(W[i-3] ^ W[i-8] ^ W[i-14] ^ W[i-16], 1); + A = H0; + B = H1; + C = H2; + D = H3; + E = H4; + for( i= 0; i<=19; i++ ) { + temp = (rotate_left(A,5) + ((B&C) | (~B&D)) + E + W[i] + 0x5A827999) & 0x0ffffffff; + E = D; + D = C; + C = rotate_left(B,30); + B = A; + A = temp; + } + for( i=20; i<=39; i++ ) { + temp = (rotate_left(A,5) + (B ^ C ^ D) + E + W[i] + 0x6ED9EBA1) & 0x0ffffffff; + E = D; + D = C; + C = rotate_left(B,30); + B = A; + A = temp; + } + for( i=40; i<=59; i++ ) { + temp = (rotate_left(A,5) + ((B&C) | (B&D) | (C&D)) + E + W[i] + 0x8F1BBCDC) & 0x0ffffffff; + E = D; + D = C; + C = rotate_left(B,30); + B = A; + A = temp; + } + for( i=60; i<=79; i++ ) { + temp = (rotate_left(A,5) + (B ^ C ^ D) + E + W[i] + 0xCA62C1D6) & 0x0ffffffff; + E = D; + D = C; + C = rotate_left(B,30); + B = A; + A = temp; + } + H0 = (H0 + A) & 0x0ffffffff; + H1 = (H1 + B) & 0x0ffffffff; + H2 = (H2 + C) & 0x0ffffffff; + H3 = (H3 + D) & 0x0ffffffff; + H4 = (H4 + E) & 0x0ffffffff; + } + var temp = cvt_hex(H0) + cvt_hex(H1) + cvt_hex(H2) + cvt_hex(H3) + cvt_hex(H4); + + return temp.toLowerCase(); +} + + // PHP Variables retrieved from eccs2.php // idp (entityID of the IdP) // date (date time of the check) @@ -21,6 +160,7 @@ if (status) { url = url.concat("&status=" + status); } + function getPastResults() { var checkDate = $.datepicker.formatDate("yy-mm-dd", $('#datepicker').datepicker().datepicker('getDate')); @@ -99,7 +239,7 @@ function format ( d ) { '<td>'+d.sp1.checkTime+'</td>'+ '<td>'+getCheckResult(d.sp1.checkResult)+'</td>'+ '<td>'+d.sp1.httpCode+'</td>'+ - '<td><a href="/eccs2html/'+d.date+'/'+getHostname(d.entityID)+'---'+getHostname(d.sp1.wayflessUrl)+'.html" target="_blank">Click to open</a></td>'+ + '<td><a href="/eccs2html/'+d.date+'/'+SHA1(d.entityID)+'---'+getHostname(d.sp1.wayflessUrl)+'.html" target="_blank">Click to open</a></td>'+ '<td><a href="'+d.sp1.wayflessUrl+'" target="_blank">Click to retry</a></td>'+ '</tr>'+ '<tr>'+ @@ -108,7 +248,7 @@ function format ( d ) { '<td>'+d.sp2.checkTime+'</td>'+ '<td>'+getCheckResult(d.sp2.checkResult)+'</td>'+ '<td>'+d.sp2.httpCode+'</td>'+ - '<td><a href="/eccs2html/'+d.date+'/'+getHostname(d.entityID)+'---'+getHostname(d.sp2.wayflessUrl)+'.html" target="_blank">Click to open</a></td>'+ + '<td><a href="/eccs2html/'+d.date+'/'+SHA1(d.entityID)+'---'+getHostname(d.sp2.wayflessUrl)+'.html" target="_blank">Click to open</a></td>'+ '<td><a href="'+d.sp2.wayflessUrl+'" target="_blank">Click to retry</a></td>'+ '</tr>'+ '</table>';