SQL DBA WORK FLOW
PROCESS:
1
ITIL (Information Technology Infrastructure Library) Process
2
DBAMaintains Activities(Daily\weekly\Monthly)
3
DR (Disaster Recovery) Plan
4
BCP (Business continuity Plan) Plan
5
RACI (Responsible Accountable Consult Inform)Matrix
6
RCA ( Root Cause Analysis) Plan
7
SLA (Service level agreement )
8
Capacity planning\management
9
Interview Handling
10
Day-to-Day Activities
11
General Responsibility
12
Ticket, Monitoring and backup Tool
13
Escalation Matrix
14. KT Questions
15 On-call & Bridge call
ITIL (Information Technology Infrastructure Library) Process:
- It is a set of good practices and it is not a standard
- ITIL V1, V2 and V3. Present works in V3 and remain all are expired
- ITIL has 5 phases/sections
- Service strategy
- Service design
- Service transactions
- Service operations
- Service improvement
Service transactions phase: Server build take care by ST team before server goes into Production or Go-Live.
Before go-live ST team handover the server to Service Operations team for continuous support.
Change Management:
- Change Management is introducing modifications to an existing environment
- It’s a planned activity
- Its having 3 stages
- Normal Changes (CR)::: Ex: File movement, adding file, changing memory parameters, High availability configurations…
- Standard Changes (S-CR) ::: Patch management, DR Testing
- Emergency Changes (E-CR) :::disk full ,100% cpu utilization, log file full
Incident Management
- Incident means Issue
- Incident is a unexpected interruption to running service and it can cause major\minor damage to the business
- Incident is sometime bug, ticket, complaint
- Incident has categories i.e. critical(p1), high(p2), medium(p3)and low(p4)
Type
Response
Resolve
P1
15min
4hrs [Platinum]
P2
30min
8hrs [Gold]
P3
30min
2 days [silver]
P4
P5
1 day
1 Day
3days [bronze]
7 Days[plastic]
P1: Production db down or SQL Server services down, Application not able to connect to database.
P2: CPU or memory usage 98%, disk space is at 99%. Authorisation jobs
P3, P4: Job failure, disk space warnings, login failure, password reset....etc
P5: All warnings..Like disk space is at 60%. Or log file size is at 60%.
- Incident Management is unplanned task
- System raised: It is a system generated alert and Monitoring tool finds any errors in SQL Server logs or job failures or disk space issues or tempdb issues any.will be raised incident or call.
- User raised : When user finds any unexpected issues in sql server then user raises ticket or incident or call to DBA team.
Problem Management:
- Problem Management handles recurring incidents.Problem ManagementNaming convention is ‘PRB’
- Identify the causes, resolve if it is from root
- Generally Sr.L3/L4 will be doing problem management
- For every Problem once it is resolved requires a RCA (Root cause analysis )
RAC [Root cause analysis]: RAC document should be prepared for feature reference for all problem cases.
RCA document should include:
- Issue name
- Why issue happened
- What is the resolution?
- Pro-active not to raise same issue again [Mitigation plan]
Capacity planning\management:
- Forecasting the growth of database based on allocating the disk space is called Capacity planning.
- It is nothing but a Disk planning
Server\Database Hardening rules: Harding is to protect the server or database from threats
Rules:
- Remove un necessary roles to the users\logins
- Provide strong sa password
- Disable built-in administrator account
- Remove unnecessary users\logins
- Always create instead of individual logins create groups
BCP (Business continuity Plan) Plan:
- BCP is to defines and perform the day-to-day operations at multiple locations.
- If one server is down then another server is do the process
- Every year once do the BCP test
RCA (Root Cause Analysis) Plan:
- This is related to Problem management
- In root cause analysis
- Issue/problem
- Solution
- Measurement
RACI Matrix: Defines Roles and responsibilities from each team which relate to each activity.
- R- Responsibility
- A-Accountable
- C-Consult
- I-Inform
- RACI is a simple Excel sheet
DBA Maintains Activities:
Daily Maintaince : full or Differential backup ,Transaction log backup, Blocking jobs,CPU Utilization
History cleanup job, application jobs
Weekly Maintaince: Full backup, Rebuild indexes, Reorganize index, DBCC Checkdb , Purging jobs, update statistics jobs
Monthly Maintaince: Patch management, Backup file testing
Yearly Maintaince: DR Testing, BCP plan
Escalation Matrix: Follow escalation based on reporting levels.
Senior Delivery manager
Team lead(TL)
Delivery manager
Level 3(L3)
Level 2(L2)
Project manager
Level 1(L1)
SLA[Service Level Agreement]: This is the agreement between client and company
Type
Response
Resolve
P1
15min
4hrs
P2
30min
8hrs
P3
30min
1 days
P4
1 day
2days
P5 1 Day 5 Day
Disaster Recovery Plan:
This defines a business which needs to run continuously without having any service interruption.
Ex: By implementing mirroring, replication, and clustering.
Real time: DR test will happen every 1 year or 6 months based on the customer approval..
Team sizes:
Team size: 12
24*7
L1 ? 3
L2? 4
L3? 3
Team lead? 1
manager: 1
Shift Management:
8 Hr : 7:00 AM- 4:00 PM ::: 1 L1+1 L2+ 1 L3
8 Hr : 12:00 PM-9:00 PM :::1 L1+2 L2+ 1 L3
8 Hr : 8:30:00 PM -07:30 AM :::1 L1+1 L2+ 1 L3
Shift handover::::
To handover to next shift person,
- What are the new requests came into our queue
- What are the tasks are pending
- What are the calls need to attend
- How many tickets are closed?
- How many critical tickets are came in shift?
SQL DBA