Incident response procedure

EGEE Incident Response Procedure

The OSCT (Operational Security Coordination Team) has produced this document with the aim to minimize the impact of security incidents, to encourage post-mortem analysis and to promote the cooperation between the sites. It is based on the EGEE Incident Response policy.

The procedure described in the document is aiming at complementing the security policies already existing at the sites giving, at the same time, a simplified guideline of the most important steps to be taken in case of a security incident.

Table of Content

Introduction

This procedure is aimed at minimising the impact of security incidents, by encouraging post-mortem analysis and promoting cooperation between grid sites. It is based on the EGEE Incident Response policy.
This grid incident response procedure is aiming at complementing local security procedures.

Definition

A security incident is the act of violating an explicit or implied security policy (ie: local security policy, EGEE Acceptable Use Policy)

Intended Audience

This document is intended for grid site security contacts and site administrators and is primarily aimed at reporting security incidents.

Incident Response Procedure

When a security incident affecting grid hosts is suspected, the following procedure MUST be followed:

  1. Inform immediately your local security team and your ROC Security Contact (either via direct contact or via project-egee-security-support (at) cern.ch). This step MUST be completed within 4 hours after the incident has been detected.
  2. In case no support is shortly available, whenever feasible and if admitted by your local security procedure if you are sufficiently familiar with the host/service to take responsibility for this action, try to contain the incident. For instance by unplugging the network cable connected to the host. Do NOT reboot or power off the host.
  3. Assist your local security team and your ROC Security Contact to confirm and then announce the incident to all the sites via project-egee-security-csirts (at) in2p3.fr.
                 This step MUST be completed within 4 hours after the incident has been detected.
  4. If appropriate:
    - Report a downtime for the affected hosts on the GOCDB
    - Send an EGEE broadcast announcing the downtime for the affected hosts Use "Security operations in progress" as the reason with no additional detail both for the broadcast and the GOCDB. This can be done at the GOCDB broadcast page
    This step MUST be completed within one working day after the incident has been detected.
  5. Perform appropriate forensics and take necessary corrective actions.
    - Identify and kill suspicious process(es) as appropriate, but aim at preserving the information they could have generated, if possible both in memory and on disk.
    - If it is suspected that some grid credentials have been abused or compromised, you MUST ensure the relevant accounts have been suspended.
    - If it is suspected that some grid credentials have been abused, you MUST ensure that the relevant VO manager(s) have been informed. VO contacts are available at the CIC Portal
    - If it is suspected that some grid credentials have been compromised, you MUST ensure that the relevant CA has been informed. CA contacts are available at the EuGRIDPMA web site.
    - If needed, seek for help from your local security team or from your ROC Security Contact or from OSCT (project-egee-security-support (at) cern.ch)
    - If relevant, additional reports containing suspicious patterns, IP addresses, files or evidence that may be of use to other Grid participants SHOULD be sent to project-egee-security-csirts (at) in2p3.fr. Never send potentially sensitive information (ex: IP addresses, usernames) without clearance from your local security team and/or your ROC Security Contact.
    The objective is to understand the source and the cause of the incident, the affected credentials and services, and the possible implications for the infrastructure.
    As part of the investigations, sites MUST be able to provide the relevant logging information (IP addresses, timestamps, identities involved), produced by local services, concerning the source of any suspicious successful connection, as follows:
           
     - 6 months prior to the discovery of the incident for successful SSH connections against grid services and for the originating submission host for grid jobs
     - 3 months prior to the discovery of the incident for all other grid-related services. For example, should an incident be detected and reported on 1st September, it is expected that sites can produce the relevant logging information for suspicious SSH connections from 1st of March.
    As part of the security incident resolution process, sites are expected to produce the following information:
    - Host(s) affected (ex: compromised hosts, hosts running suspicious user code)
    - Host(s) used as a local entry point to the site (ex: UI or WMS IP address)
    - Remote IP address(es) of the attacker
    - Evidence of the compromise, including timestamps (ex: suspicious files or log entry)
    - What was lost, details of the attack (ex: compromised credentials, (root) compromised host)
    - If available and relevant, the list of other sites possibly affected
    - If available and relevant, possible vulnerabilities exploited by the attacker
    - The actions taken to resolve the incident
    Throughout step 5, requests from the Operational Security Coordination Team MUST be followed-up within 4 hours.
  6. Coordinate with your local security team and your ROC Security Contact to send an incident closure report within 1 month following the incident, to all the sites via project-egee-security-csirts (at) in2p3.fr, including lessons learnt and resolution.
  7. Restore the service, and if needed, send an EGEE broadcast, update the GOCDB, service documentation and procedures to prevent recurrence as necessary.

    When contacted, all recipients of project-egee-security-csirts@in2p3.fr are expected to take appropriate action, including processing the information available (ex:suspicious log entries, DN, or IP addresses), checking locally for signs of compromise, and reporting suspicious findings.

Templates for Reporting a Security Incident

Should a security incident be suspected, the use of the following email templates is encouraged.

The first template is aimed at notifying the grid participants soon after the incident has been discovered (heads-up), as described in Step 3 of the procedure above.

FROM: <your_email_address@your_organisation>
TO: < project-egee-security-csirts@in2p3.fr >
SUBJECT: Security incident suspected at <your site>

** PLEASE DO NOT REDISTRIBUTE ** EGEE-<DATE> (ex: EGEE-20090531)
** This message is sent to the EGEE CSIRTs and must NOT be publicly archived **

Dear CSIRTs,
It seems a security incident has been detected at <your site>.
Summary of the information available so far:
<Ex: A malicious SSH connection was detected from 012.012.012.012. The extent of the incident is
unclear for now, and more information will be published in the coming hours as forensics are
progressing at our site. However, all sites should check for successful SSH connection from
012.012.012.012 as a precautionary measure.>

The second template can be used to provide a detailed view of the incident, and may be completed and reposted as the investigation progresses, as described in Step 5 of the procedure above.

It is also possible to use the following moderated Web form.

FROM: <your_email_address@your_organisation>
TO: < project-egee-security-csirts (at) in2p3.fr >
SUBJECT: Security incident suspected at <your site>

** PLEASE DO NOT REDISTRIBUTE ** EGEE-<DATE> (ex: EGEE-20090531)
** This message is sent to the EGEE CSIRTs and must NOT be publicly archived **

Dear CSIRTs,
It seems a security incident has been detected at <your site>.

- Short summary of the incident
<Provide a high level overview of the incident>

- Host(s) affected
< List of compromised hosts and/or hosts running suspicious user code.
ex: grid-worker-node-124.mysite.org (123.123.123.123)>

- Host(s) used as a local entry point to the site (ex: UI or WMS IP address)
   CERN-LCG-EDMS-867454  version 1.3 Last Saved on 20 Apr 2009
<The host that the attacker is likely to have used to access the site.
ex: grid-ui-101.mysite.org (123.123.123.124)>

- Remote IP address(es) of the attacker
<The remote host from where the attacker is likely to have connected from.
ex: 123.adsl.somecorp.com (012.012.012.012)>

- Evidence of the compromise, including timestamps (ex: suspicious files or log entry)
<Ex: the attacker logged in has root from 123.adsl.somecorp.com. Times are UTC:
Mar 24 12:00:09 grid-ui-101 sshd[13896]: Accepted password for root from 012.012.012.012>

- What was lost, details of the attack
< Provide available details on the extent of the compromise. For ex:
System logs revealed the attacker guested the root password of grid-ui-101 on Mar 24 12:00:09
(UTC) after hundreds of attempts. Then, the attacker [...] etc.>

- If available and relevant, the list of other sites possibly affected
<Ex: firewall logs reveals suspicious SSH connections from the compromised node to grid-
ui.friendlysite.org on Mar 24 13:01:03 (UTC). friendlysite.org has been contacted.>

- Possible vulnerabilities exploited by the attacker
<Ex: the attacker exploited a weak root password and gained further access by exploiting
CVE-20091234 against [...] etc.>

- The actions taken to resolve the incident
<Ex: Disc images have been saved, hosts have been reinstalled from scratched with new, strong root
passwords, and SSH has been configured to prevent "root" logins with password.>

- Recommendations for other sites, actions suggested
<Ex: Sites should check and report any successful SSH connection grid-ui-101 between Mar 24
12:00:09 (UTC) and Mar 24 17:00:00 (UTC).
It is also recommended to avoid direct SSH access, and to configure sshd with "PermitRootLogin
without-password".>

- Timeline of the incident
<Ex:
2009-03-24 09:12:43 Multiple SSH connection attempts from 012.012.012.012
2009-03-24 12:00:09 Attacker connects as root on grid-ui-101.mysite.org from 012.012.012.012
2009-03-24 13:01:03 SSH scan from grid-ui-101 against grid-ui.friendlysite.org
[,,,]
2009-03-24 15:00:00 Site security team investigating
2009-03-24 15:34:00 EGEE CSIRTs informed via project-egee-security-csirts@in2p3.fr
[...]>

Global Incident Coordination

A security incident coordinator from the EGEE Operational Security Coordination Team (OSCT,
http://cern.ch/OSCT) appointed for each incident.
The role of OSCT incident coordinator includes:

  1. Actively stimulate and probe the affected participants to obtain accurate information, in an appropriate level of details, and in a timely manner
  2. Aim at understanding the exact cause of the incident, what assets have been compromised
    (credentials, etc.), and how to resolve the incident
  3. Help involved sites to resolve the incident, by providing recommendations, promoting collaboration
    with other sites and by periodically checking their status
  4. Assume responsibility to contact any other involved participant in EGEE, including VO managers.
    Contacts via external organisations MUST be done via the EGEE Security Officer
  5. Whenever and as often as necessary, send updated detailed reports to the sites directly involved and
    affected by the incident, containing interesting findings or possible leads that could be used to resolve
    the incident
  6. Whenever and as often as necessary, send updated summary reports to all the CSIRTs (project-
    egee-security-csirts@in2p3.fr), containing the status of the incident and possibly details needed to
    search locally for signs of malicious activity. Never send sensitive information without prior agreement
    of the originating site.

Relevant and Related Standards and Practices

  RFC 2350 - Expectations for Computer Security Incident Response
  RFC 2196 - Site Security Handbook
  RFC 3013 – Guidelines for Evidence Collection and Archiving
  IETF Extended Incident Handling (INCH)
  IETF Incident Object Description Exchange Format (IODEF)
  LCG Security Group, Agreement on Incident Response
  CERT/CC - Handbook for Computer Security Incident Response Teams
  CERT/CC - Incident Reporting Guidelines
  CERT/CC - Creating a Computer Security Incident Response Team: A Process for Getting
  Started

  CERT/CC - State of the Practice of Computer Security Incident Response Teams (CSIRTs)

i