Sweep Away the Garbage
for scalable, fault-tolerant shared VM storage
Adam Litke -
alitke@redhat.com
FOSDEM 2016 - 30 January 2016
The next 40 minutes
oVirt shared storage architecture
Preventing data corruption
Recovering from failure
Examples
Local vm storage
Local vm storage
Multi-host local vm storage
Shared vm storage
oVirt shared storage
oVirt storage domain
oVirt image
oVirt volume
Storage operations
Datapath operations
A VM or host accessing volume contents
These are the most common and most important
Lots of IO
Long running
Narrow in scope
Example: VM volume access
Example: Host volume access
Metadata operations
Adding / removing / rearranging storage objects
Changing storage domain metadata
Minimal IO
Short running
Can have broad scope
Example: create volume
Example: delete image
Challenge: conflicts
Preventing conflicts
Requirement: data integrity
Goal: maximize concurrency
Interaction between storage objects is complex
Orchestration required across several domains
User actions
Hosts
Local threads
Same VM on multiple hosts
Conflicting metadata updates
Run VM during snapshot
Solution: Locking
Management level locking
Entities are locked while executing user-driven actions
Lock an image during creation
Lock a VM while taking a snapshot
Lock a host while it modifies storage
Shared storage locking
Implemented using
Sanlock
Lockspace is on shared storage
Leases grant hosts exclusive access to storage resources
Storage domain lease: needed for metadata changes
Volume lease: protects volume contents
More about sanlock
Host IDs
Every host has a unique ID
Uniqueness is enforced by SANlock
IDs must be periodically renewed
Failure to renew will surrender all resource leases
Resource leases
Represent an arbitrary resource (storage or otherwise)
Misbehaving hosts will be fenced (rebooted)
Process level locking
Implemented with a local lock manager and RWLocks
Locks grant threads either shared or exclusive access
Storage domain lock: protects metadata
Image lock: protects volume chain and metadata
Challenge: interruptions
Handling interruptions
Some steps in a task are never completed
Happen naturally or due to bugs
Power or network outage
Hardware failure
Software failure
Must be carefully mitigated to keep storage coherent
Approaches
Storage task manager with rollback capability
Storage transactions with garbage collection
Interrupted volume creation
Interrupted volume copy
Solution: Transactional Storage
Storage transactions
Garbage collection
Monitoring and resolution
Storage transactions
Storage commands must be a single transaction
A transaction is opened with a marker operation
Subsequent steps accumulate "garbage" on storage
A transaction is committed by converting the start marker
Example
Garbage collection
Runs periodically on an arbitrary host
Identifies candidates by finding markers
Acquires necessary locks for the candidate
Verifies the candidate should be collected
Cleans garbage associated with the marker
Removes the marker
Identify candidate
Acquire locks
Clean
Remove marker
Identify candidate
Acquire locks
Abort
Monitoring and resolution
Running commands raise events or can be polled
Progress
State changes
Error code and context
Command results are not persistent
Success or failure is evident by examining storage
Practical examples
Create volume
Remove volume
Clone volume
Example: Create volume
Acquire domain lease
Acquire image lock
Create volatile image directory
Create volatile metadata file
Create lease file
Create volume data file
Commit metadata file
Commit image directory
Release image lock
Release domain lease
Example: Remove volume
Existing volume
Acquire domain lease
Acquire image lock
Make image volatile
Invoke the garbage collector
Release image lock
Release domain lease
Example: Clone volume
Existing volume
Create another volume
Acquire source image lock
Acquire target image lock
Acquire source volume lease
Acquire target volume lease
Mark target volume illegal
Copy data
Progress event
Mark target volume legal
Release target volume lease
Release source volume lease
Release target image lock
Release source image lock
Completion event
Locking order
Strict rules needed to prevent deadlock
Storage leases before local locks
Big containers before smaller containers
Storage Domain ➡ Image ➡ Volume
Source volume before destination volume
Release the newest locks first
Join us!
http://www.ovirt.org
irc://irc.oftc.net/ovirt
http://lists.ovirt.org/mailman/listinfo/devel
http://lists.ovirt.org/mailman/listinfo/users
Questions?