SQL DBA WORK FLOW

Published on by LakshmiSaahul,Dhana Royal

PROCESS:

1

ITIL (Information Technology Infrastructure Library) Process

2

DBAMaintains Activities(Daily\weekly\Monthly)

3

DR (Disaster Recovery) Plan

4

BCP (Business continuity Plan) Plan

5

RACI (Responsible Accountable Consult Inform)Matrix

6

RCA ( Root Cause Analysis) Plan

7

SLA (Service level agreement )

8

Capacity planning\management

9

Interview Handling

10

Day-to-Day Activities

11

General Responsibility

12

Ticket, Monitoring and backup Tool

13

Escalation Matrix

14. KT Questions

15 On-call & Bridge call

ITIL (Information Technology Infrastructure Library) Process:

  • It is a set of good practices and it is not a standard
  • ITIL V1, V2 and V3. Present works in V3 and remain all are expired
  • ITIL has 5 phases/sections
  1. Service strategy
  2. Service design
  3. Service transactions
  4. Service operations
  5. Service improvement

Service transactions phase: Server build take care by ST team before server goes into Production or Go-Live.

Before go-live ST team handover the server to Service Operations team for continuous support.

Change Management:

  • Change Management is introducing modifications to an existing environment
  • It’s a planned activity
  • Its having 3 stages
  1. Normal Changes (CR)::: Ex: File movement, adding file, changing memory parameters, High availability configurations…
  2. Standard Changes (S-CR) ::: Patch management, DR Testing
  3. Emergency Changes (E-CR) :::disk full ,100% cpu utilization, log file full

Incident Management

  • Incident means Issue
  • Incident is a unexpected interruption to running service and it can cause major\minor damage to the business
  • Incident is sometime bug, ticket, complaint
  • Incident has categories i.e. critical(p1), high(p2), medium(p3)and low(p4)

Type

Response

Resolve

P1

15min

4hrs [Platinum]

P2

30min

8hrs [Gold]

P3

30min

2 days [silver]

P4

P5

1 day

1 Day

3days [bronze]

7 Days[plastic]

P1: Production db down or SQL Server services down, Application not able to connect to database.

P2: CPU or memory usage 98%, disk space is at 99%. Authorisation jobs

P3, P4: Job failure, disk space warnings, login failure, password reset....etc

P5: All warnings..Like disk space is at 60%. Or log file size is at 60%.

  • Incident Management is unplanned task
  1. System raised: It is a system generated alert and Monitoring tool finds any errors in SQL Server logs or job failures or disk space issues or tempdb issues any.will be raised incident or call.
  2. User raised : When user finds any unexpected issues in sql server then user raises ticket or incident or call to DBA team.

Problem Management:

  • Problem Management handles recurring incidents.Problem ManagementNaming convention is ‘PRB’
  • Identify the causes, resolve if it is from root
  • Generally Sr.L3/L4 will be doing problem management
  • For every Problem once it is resolved requires a RCA (Root cause analysis )

RAC [Root cause analysis]: RAC document should be prepared for feature reference for all problem cases.

RCA document should include:

  • Issue name
  • Why issue happened
  • What is the resolution?
  • Pro-active not to raise same issue again [Mitigation plan]

Capacity planning\management:

  • Forecasting the growth of database based on allocating the disk space is called Capacity planning.
  • It is nothing but a Disk planning

Server\Database Hardening rules: Harding is to protect the server or database from threats

Rules:

  1. Remove un necessary roles to the users\logins
  2. Provide strong sa password
  3. Disable built-in administrator account
  4. Remove unnecessary users\logins
  5. Always create instead of individual logins create groups

BCP (Business continuity Plan) Plan:

  • BCP is to defines and perform the day-to-day operations at multiple locations.
  • If one server is down then another server is do the process
  • Every year once do the BCP test

RCA (Root Cause Analysis) Plan:

  • This is related to Problem management
  • In root cause analysis
  1. Issue/problem
  2. Solution
  3. Measurement

RACI Matrix: Defines Roles and responsibilities from each team which relate to each activity.

  1. R- Responsibility
  2. A-Accountable
  3. C-Consult
  4. I-Inform
  • RACI is a simple Excel sheet

DBA Maintains Activities:

Daily Maintaince : full or Differential backup ,Transaction log backup, Blocking jobs,CPU Utilization

History cleanup job, application jobs

Weekly Maintaince: Full backup, Rebuild indexes, Reorganize index, DBCC Checkdb , Purging jobs, update statistics jobs

Monthly Maintaince: Patch management, Backup file testing

Yearly Maintaince: DR Testing, BCP plan

Escalation Matrix: Follow escalation based on reporting levels.

Senior Delivery manager

Team lead(TL)

Delivery manager

Level 3(L3)

Level 2(L2)

Project manager

Level 1(L1)

SLA[Service Level Agreement]: This is the agreement between client and company

Type

Response

Resolve

P1

15min

4hrs

P2

30min

8hrs

P3

30min

1 days

P4

1 day

2days

P5 1 Day 5 Day

Disaster Recovery Plan:

This defines a business which needs to run continuously without having any service interruption.

Ex: By implementing mirroring, replication, and clustering.

Real time: DR test will happen every 1 year or 6 months based on the customer approval..

Team sizes:

Team size: 12

24*7

L1 ? 3

L2? 4

L3? 3

Team lead? 1

manager: 1

Shift Management:

8 Hr : 7:00 AM- 4:00 PM ::: 1 L1+1 L2+ 1 L3

8 Hr : 12:00 PM-9:00 PM :::1 L1+2 L2+ 1 L3

8 Hr : 8:30:00 PM -07:30 AM :::1 L1+1 L2+ 1 L3

Shift handover::::

To handover to next shift person,

  1. What are the new requests came into our queue
  2. What are the tasks are pending
  3. What are the calls need to attend
  4. How many tickets are closed?
  5. How many critical tickets are came in shift?

SQL DBA

Advertising
To be informed of the latest articles, subscribe:
Comment on this post