1
00:00:00,000 --> 00:00:06,200
Welcome to the Azure Security Podcast,

2
00:00:06,200 --> 00:00:09,360
where we discuss topics relating to security, privacy,

3
00:00:09,360 --> 00:00:13,680
reliability, and compliance on the Microsoft Cloud Platform.

4
00:00:13,680 --> 00:00:16,880
Hey everybody, welcome to Episode 53.

5
00:00:16,880 --> 00:00:19,320
This week is a special episode.

6
00:00:19,320 --> 00:00:22,520
We're going to talk about a new feature that's coming in

7
00:00:22,520 --> 00:00:25,080
Azure SQL DB named Ledger.

8
00:00:25,080 --> 00:00:27,560
This week is myself, Michael and Sarah,

9
00:00:27,560 --> 00:00:30,160
and our guest this week is Peter Vanhove.

10
00:00:30,160 --> 00:00:33,560
We'll also have no news this week because it is a special episode.

11
00:00:33,560 --> 00:00:36,240
Peter, thank you so much for joining us this week.

12
00:00:36,240 --> 00:00:38,080
Would you like to spend a moment and just give

13
00:00:38,080 --> 00:00:39,760
our listeners a background about yourself?

14
00:00:39,760 --> 00:00:42,240
Yeah. Hi. Good morning,

15
00:00:42,240 --> 00:00:45,120
good afternoon, wherever you are in the world listening to us.

16
00:00:45,120 --> 00:00:46,160
My name is Peter.

17
00:00:46,160 --> 00:00:50,680
I'm a Program Manager at Microsoft in the Data Platform Security Team.

18
00:00:50,680 --> 00:00:55,820
So our team is responsible for all security features inside

19
00:00:55,820 --> 00:00:59,680
SQL products like SQL Server, Azure SQL DB,

20
00:00:59,680 --> 00:01:03,380
MI, and I'm the feature Program Manager of Ledger,

21
00:01:03,380 --> 00:01:05,940
the feature we're going to talk about today.

22
00:01:05,940 --> 00:01:07,520
Thanks for that, Peter.

23
00:01:07,520 --> 00:01:11,020
So tell us what is Ledger?

24
00:01:11,020 --> 00:01:12,160
What does it do?

25
00:01:12,160 --> 00:01:17,600
Why is it so exciting and why are we having a special episode on it?

26
00:01:17,600 --> 00:01:19,560
Yeah, sure. I can explain.

27
00:01:19,560 --> 00:01:24,140
Imagine that you're working at a financial or

28
00:01:24,140 --> 00:01:28,040
medical company and that you have

29
00:01:28,040 --> 00:01:30,880
very sensitive data inside your database.

30
00:01:30,880 --> 00:01:35,600
How can you be 100 percent sure that nobody has actually tampered with your data?

31
00:01:35,600 --> 00:01:37,800
If I look at my previous life,

32
00:01:37,800 --> 00:01:43,480
I was a DBA so I literally had full control of all the databases.

33
00:01:43,480 --> 00:01:44,680
So if I wanted to,

34
00:01:44,680 --> 00:01:47,640
I could just log in into all these databases,

35
00:01:47,640 --> 00:01:52,280
tampered with the data, erase all my traces, and you're good to go.

36
00:01:52,280 --> 00:01:56,040
Nobody would ever have noticed that I have modified the data.

37
00:01:56,040 --> 00:02:00,360
So it's really crucial for these financial and medical companies

38
00:02:00,360 --> 00:02:03,480
that they can prove that the data can be trusted.

39
00:02:03,480 --> 00:02:06,720
So currently for them, it's really hard to do.

40
00:02:06,720 --> 00:02:09,000
You can try to switch on auditing.

41
00:02:09,000 --> 00:02:10,360
That will help, of course.

42
00:02:10,360 --> 00:02:13,920
But then again, if you look at my role as a DBA,

43
00:02:13,920 --> 00:02:16,280
I could easily switch off the auditing,

44
00:02:16,280 --> 00:02:19,720
do my modification, then switch on auditing again.

45
00:02:19,720 --> 00:02:26,120
These companies are, like I said, not 100 percent sure that they can trust their data.

46
00:02:26,120 --> 00:02:28,240
On the other side, if we look at

47
00:02:28,240 --> 00:02:32,200
multi-party companies or workflows, let's say,

48
00:02:32,200 --> 00:02:34,240
typically what these companies are doing,

49
00:02:34,240 --> 00:02:37,360
they are looking into blockchain technologies to

50
00:02:37,360 --> 00:02:42,960
guarantee the integrity of the data and the business logic in an untrusted environment.

51
00:02:42,960 --> 00:02:47,680
Thing with blockchain is that it's decentralized structure,

52
00:02:47,680 --> 00:02:52,320
and it represents significant challenge for these companies.

53
00:02:52,320 --> 00:02:55,600
When it comes to real-world production workloads,

54
00:02:55,600 --> 00:02:57,520
it's completely decentralized,

55
00:02:57,520 --> 00:02:59,080
it's not really performant,

56
00:02:59,080 --> 00:03:04,240
and there are many scenarios where blockchain solutions is really an overkill,

57
00:03:04,240 --> 00:03:06,960
and it doesn't really justify the cost and

58
00:03:06,960 --> 00:03:11,880
the performance overhead of implementing a complete blockchain structure.

59
00:03:11,880 --> 00:03:17,440
So that's why we implemented Ledger to address these two problems.

60
00:03:17,440 --> 00:03:22,760
So to address or how can we show that or prove that

61
00:03:22,760 --> 00:03:27,440
our data that is inside the database is completely trusted.

62
00:03:27,440 --> 00:03:33,720
So is Ledger a service or is it actually built into the product?

63
00:03:33,720 --> 00:03:36,600
I'm not a database guru like Michael,

64
00:03:36,600 --> 00:03:39,000
so I'm asking the noob questions.

65
00:03:39,000 --> 00:03:42,360
It's a feature. So it's not a new flavor of

66
00:03:42,360 --> 00:03:45,120
an Azure SQL database or SQL Server.

67
00:03:45,120 --> 00:03:50,880
No, it's just a new feature that is enabled in every service tier of Azure SQL DB.

68
00:03:50,880 --> 00:03:56,400
So yeah, you just need to let's say switch it on by creating what we call a letter table.

69
00:03:56,400 --> 00:03:58,440
I'll come back to that immediately.

70
00:03:58,440 --> 00:04:02,200
So no, there's no extra service that needs to be enabled,

71
00:04:02,200 --> 00:04:06,400
and this feature is currently available in Azure SQL DB.

72
00:04:06,400 --> 00:04:10,640
So as Michael said, we're just launching the GA today,

73
00:04:10,640 --> 00:04:17,760
and the feature is also available in the new version, the box version of SQL Server, SQL Server 2022.

74
00:04:17,760 --> 00:04:24,640
Nice. Just because it's always something to get out of the way because people have budgets,

75
00:04:24,640 --> 00:04:27,280
but do we charge for it? Is there an extra charge?

76
00:04:27,280 --> 00:04:30,640
How does that work for that particular feature?

77
00:04:30,640 --> 00:04:32,640
No, it's completely free.

78
00:04:32,640 --> 00:04:35,960
So you just enable the ledger tables.

79
00:04:35,960 --> 00:04:38,760
You might end up with some extra storage

80
00:04:38,760 --> 00:04:41,120
because we're keeping track of the history.

81
00:04:41,120 --> 00:04:43,880
Again, I'll talk about that a little bit later,

82
00:04:43,880 --> 00:04:47,040
but that will basically be the extra cost of the storage.

83
00:04:47,040 --> 00:04:51,160
You don't need to pay anything extra for this feature.

84
00:04:51,160 --> 00:04:55,120
And as I said, it's available in all different service tiers.

85
00:04:55,120 --> 00:04:57,960
Even in the basic service tier, the cheapest one,

86
00:04:57,960 --> 00:05:01,840
you are able to use the ledger feature there.

87
00:05:01,840 --> 00:05:04,200
All right, Peter. So let's get into some of the nitty-gritty.

88
00:05:04,200 --> 00:05:05,800
So how does this thing work?

89
00:05:05,800 --> 00:05:10,520
I mean, you just mentioned before that you can create a table and say it's a ledger table.

90
00:05:10,520 --> 00:05:12,240
So what happens next?

91
00:05:12,240 --> 00:05:19,040
Yeah, so it's really about bringing the power of blockchain into the SQL database.

92
00:05:19,040 --> 00:05:23,640
So we're making the SQL or the data that is inside the ledger table

93
00:05:23,640 --> 00:05:26,120
tamper evidence by using cryptographic.

94
00:05:26,120 --> 00:05:28,160
And when we talk about cryptographies,

95
00:05:28,160 --> 00:05:33,080
literally you should see it as the pattern of cryptographic

96
00:05:33,080 --> 00:05:36,080
that you see in blockchain nowadays.

97
00:05:36,080 --> 00:05:37,920
So how does it work?

98
00:05:37,920 --> 00:05:40,880
We have two different types of tables.

99
00:05:40,880 --> 00:05:45,120
We have an updatable ledger table and we have an append-only ledger table.

100
00:05:45,120 --> 00:05:49,320
Typically, updatable ledger tables are used by applications, of course,

101
00:05:49,320 --> 00:05:54,320
so they need to be able to do insert update delete statements.

102
00:05:54,320 --> 00:05:57,640
The other flavor that we have is append-only.

103
00:05:57,640 --> 00:05:59,760
And like I said, you can just append records.

104
00:05:59,760 --> 00:06:05,920
So we only allow insert statements, we don't allow any update or delete statements

105
00:06:05,920 --> 00:06:08,280
whatsoever on these particular tables.

106
00:06:08,280 --> 00:06:10,680
So the user can actually choose between those two.

107
00:06:10,680 --> 00:06:12,880
Let's go for the updatable ledger table.

108
00:06:12,880 --> 00:06:19,040
So what happens is when you update a particular column in a row of a ledger table,

109
00:06:19,040 --> 00:06:23,400
what we're going to do is we're going to take the previous version of that row

110
00:06:23,400 --> 00:06:26,840
and we're going to put that in what we call a history table.

111
00:06:26,840 --> 00:06:30,720
People that are familiar with SQL Server, they will say, yeah, hey, we already have that.

112
00:06:30,720 --> 00:06:32,000
It's called temporal tables.

113
00:06:32,000 --> 00:06:33,320
Well, that's true.

114
00:06:33,320 --> 00:06:37,800
And I can even tell you more on the ledger feature is built on top of the technology

115
00:06:37,800 --> 00:06:40,560
of temporal table, but it's more advanced.

116
00:06:40,560 --> 00:06:45,160
It's better because we have the cryptographic parts under the hood as well.

117
00:06:45,160 --> 00:06:51,200
So what we're doing, when we move that previous version of the row to the history table,

118
00:06:51,200 --> 00:06:53,840
we also going to add extra metadata to that.

119
00:06:53,840 --> 00:07:00,800
So for example, the transaction ID, the user that has actually executed that transaction,

120
00:07:00,800 --> 00:07:02,800
the commit timestamp of that transaction.

121
00:07:02,800 --> 00:07:06,120
So everything is then stored in that history table.

122
00:07:06,120 --> 00:07:10,240
On top of that, when you create a ledger table,

123
00:07:10,240 --> 00:07:13,000
we're also going to create what we call a ledger view.

124
00:07:13,000 --> 00:07:16,760
And the ledger view gives you a chronicle overview of your database.

125
00:07:16,760 --> 00:07:20,160
So it shows you the current value of your records,

126
00:07:20,160 --> 00:07:24,960
but it also shows you the historical values of the record.

127
00:07:24,960 --> 00:07:32,720
So we make your life a little bit easier by default creating a ledger view for you.

128
00:07:32,720 --> 00:07:37,240
So this is how we keep track of the historical values of your ledger table.

129
00:07:37,240 --> 00:07:40,320
Now, how is the data then actually protected?

130
00:07:40,320 --> 00:07:43,600
And it is protected in what we call a database ledger.

131
00:07:43,600 --> 00:07:48,080
So we're basically using a blockchain structure under the hood

132
00:07:48,080 --> 00:07:52,920
that is implemented in SQL Server or in Azure SQL DB.

133
00:07:52,920 --> 00:07:54,040
What does it mean?

134
00:07:54,040 --> 00:08:02,000
So every transaction is hashed by using a SHA-256 hashing mechanism,

135
00:08:02,000 --> 00:08:05,880
similar what you see in blockchain nowadays.

136
00:08:05,880 --> 00:08:07,280
Now, what are we doing?

137
00:08:07,280 --> 00:08:12,880
Every row in a transaction is hashed by using a Merkle Tree data structure

138
00:08:12,880 --> 00:08:14,760
that creates a root hash.

139
00:08:14,760 --> 00:08:18,240
So for those who don't know what a Merkle Tree is,

140
00:08:18,240 --> 00:08:23,560
what we're using as the input for the hash function is the row information,

141
00:08:23,560 --> 00:08:27,680
like how many columns did we update, what were the columns,

142
00:08:27,680 --> 00:08:31,080
what were the column types, and so on.

143
00:08:31,080 --> 00:08:35,960
So we use that information as input for the hash functions.

144
00:08:35,960 --> 00:08:38,600
And then we're going to hash these rows,

145
00:08:38,600 --> 00:08:41,720
and we're going to produce what we call a parent hash.

146
00:08:41,720 --> 00:08:46,920
And then these parent hashes are then hashed again into new hash and so on and so on,

147
00:08:46,920 --> 00:08:50,480
until ultimately we produce a root hash

148
00:08:50,480 --> 00:08:55,280
that is representing all the rows in that single transaction.

149
00:08:55,280 --> 00:08:57,800
That's the first step to what we're doing.

150
00:08:57,800 --> 00:09:02,320
And secondly, we batch all these transactions into blocks.

151
00:09:02,320 --> 00:09:08,120
So what does that mean is like in Azure SQL DB, every 30 seconds,

152
00:09:08,120 --> 00:09:13,200
we're going to create a new block that will consist all the transactions

153
00:09:13,200 --> 00:09:16,720
that have occurred during that time period.

154
00:09:16,720 --> 00:09:20,160
And that block is then also cryptographically hashed

155
00:09:20,160 --> 00:09:23,040
and linked to the previous block, just like in the blockchain.

156
00:09:23,040 --> 00:09:28,120
So that's how we're building that blockchain technology,

157
00:09:28,120 --> 00:09:30,960
or that's how we built the blockchain, in fact.

158
00:09:30,960 --> 00:09:34,120
At that point, once we have produced the block,

159
00:09:34,120 --> 00:09:37,400
also what we call a database digest,

160
00:09:37,400 --> 00:09:43,400
we're going to push out that block or the database digest out of SQL server.

161
00:09:43,400 --> 00:09:46,360
And we're going to push it into trusted storage,

162
00:09:46,360 --> 00:09:50,080
like for example, Azure Immutable Blob Storage,

163
00:09:50,080 --> 00:09:54,640
or we also currently have a new service now called Azure Confidential Ledger.

164
00:09:54,640 --> 00:09:59,520
So you can choose between both of them, what you prefer.

165
00:09:59,520 --> 00:10:03,480
And once this is pushed out, it is stored in Immutable Storage,

166
00:10:03,480 --> 00:10:07,560
which means nobody can there actually tamper with that data.

167
00:10:07,560 --> 00:10:12,600
Now it becomes interesting, of course, because remember when I said,

168
00:10:12,600 --> 00:10:14,520
what is the problem that we're trying to solve?

169
00:10:14,520 --> 00:10:17,520
Well, we need to build trust and we need to have that.

170
00:10:17,520 --> 00:10:24,400
We need to make our customers that they're able to prove

171
00:10:24,400 --> 00:10:26,040
that the data can be trusted.

172
00:10:26,040 --> 00:10:28,720
So how can they prove this right now?

173
00:10:28,720 --> 00:10:34,440
Because we're storing the database digest on a different location,

174
00:10:34,440 --> 00:10:38,080
what we can do next is we can run a stored procedure

175
00:10:38,080 --> 00:10:44,600
that is going to verify the database digest that are outside the database.

176
00:10:44,600 --> 00:10:48,480
And we're going to compare that in what we have inside in the database,

177
00:10:48,480 --> 00:10:52,520
which means what we're going to do is we're going to recalculate

178
00:10:52,520 --> 00:10:55,560
the hashes in real time in SQL.

179
00:10:55,560 --> 00:11:00,360
And we're going to match that with the hashes that we have on the Immutable Storage.

180
00:11:00,360 --> 00:11:03,000
And if we have a match between those hashes,

181
00:11:03,000 --> 00:11:09,240
it means that we are 100% sure that nobody has actually tampered with the data.

182
00:11:09,240 --> 00:11:14,520
If we don't have a match, it means that, hey, somebody screwed up our database,

183
00:11:14,520 --> 00:11:18,920
somebody screwed up the data that is inside that particular table.

184
00:11:18,920 --> 00:11:24,640
So we are sure that the data cannot be trusted and we need to figure out what happens.

185
00:11:24,640 --> 00:11:28,360
Then you just need to look up into the history tables,

186
00:11:28,360 --> 00:11:32,120
what was the transaction, and start your investigation,

187
00:11:32,120 --> 00:11:36,960
what exactly has happened with the data inside that ledger table.

188
00:11:36,960 --> 00:11:41,360
This means that it makes it really, really easy for the customers to prove

189
00:11:41,360 --> 00:11:45,640
that tampering hasn't occurred on the data and that your data can actually be trusted.

190
00:11:45,640 --> 00:11:51,280
So it's really, really powerful for our customers to have that

191
00:11:51,280 --> 00:11:57,160
and to prove that your data can be trusted to external altitudes, for example.

192
00:11:57,160 --> 00:12:02,480
Or if you're working with supply chains, for example, where you have multi-parties,

193
00:12:02,480 --> 00:12:08,480
these verifications can also be run by those multi-parties to gain trust and to know,

194
00:12:08,480 --> 00:12:12,400
okay, this data can be trusted from all the parties.

195
00:12:12,400 --> 00:12:13,760
And that's an important part, right?

196
00:12:13,760 --> 00:12:18,960
Because the hashes have to be stored on something that is tamper resistant, right?

197
00:12:18,960 --> 00:12:21,160
I mean, you could store these things on a storage account,

198
00:12:21,160 --> 00:12:25,360
but there's really no verification that they've been tampered with.

199
00:12:25,360 --> 00:12:30,160
I mean, even if you put them in a storage account with an RBAC control,

200
00:12:30,160 --> 00:12:35,240
a strong RBAC policy, the problem there is that the admins can still...

201
00:12:35,240 --> 00:12:39,840
If the RBAC policy says admin can access right over these hashes,

202
00:12:39,840 --> 00:12:43,320
a rogue admin can still write over the hashes.

203
00:12:43,320 --> 00:12:46,920
So we have to use mitigations that are basically cryptographic controls,

204
00:12:46,920 --> 00:12:51,920
or in the case of Blobstore, we can use immutable storage.

205
00:12:51,920 --> 00:12:53,720
Is that a fair comment?

206
00:12:53,720 --> 00:12:55,000
Yeah, that's correct.

207
00:12:55,000 --> 00:12:59,880
If you just use an ordinary storage account, you're right.

208
00:12:59,880 --> 00:13:04,520
Everybody that has the appropriate permissions can go into these digest files.

209
00:13:04,520 --> 00:13:08,600
These are just JSON files that we produce with the hashes inside.

210
00:13:08,600 --> 00:13:16,080
But still, you can open these files and you can tamper with whatever you want in these files.

211
00:13:16,080 --> 00:13:21,400
So if you switch on the immutable setting on the Azure Storage account,

212
00:13:21,400 --> 00:13:25,160
it's just writing once and reading multiple times,

213
00:13:25,160 --> 00:13:29,240
but there's no way you can actually modify anything in the digest.

214
00:13:29,240 --> 00:13:31,480
So that is crucial.

215
00:13:31,480 --> 00:13:34,520
Yeah, the nice thing about Merkle trees as well is that they're actually quite efficient

216
00:13:34,520 --> 00:13:39,520
when it comes to verification, which is a little bit of a fun fact

217
00:13:39,520 --> 00:13:41,400
for all you people and not crypto nerds.

218
00:13:41,400 --> 00:13:45,360
Yeah, so we also have the Azure Confidential Ledger, which is a new feature

219
00:13:45,360 --> 00:13:49,840
that can also be used for storage of the hashes.

220
00:13:49,840 --> 00:13:52,640
But at the end of the day, that is another service, right?

221
00:13:52,640 --> 00:13:53,080
Yeah.

222
00:13:53,080 --> 00:13:57,040
Storage is a service and ACL is a service as well.

223
00:13:57,040 --> 00:13:59,520
One of the examples you gave was a malicious DBA.

224
00:13:59,520 --> 00:14:03,000
And I think that's just a fantastic scenario, right?

225
00:14:03,000 --> 00:14:05,040
So how do I protect against a malicious DBA?

226
00:14:05,040 --> 00:14:08,440
And this is really important, like a zero trust environment as well.

227
00:14:08,440 --> 00:14:13,440
If an attacker is on the network and they've completely compromised absolutely everything,

228
00:14:13,440 --> 00:14:17,440
is the data still secure from tampering?

229
00:14:17,440 --> 00:14:19,520
By data, I mean that the hashes.

230
00:14:19,520 --> 00:14:25,080
And that's why mutable storage and has a confidential ledger is just so critically important.

231
00:14:25,080 --> 00:14:31,440
I mean, their threat models are designed specifically to mitigate a rogue DBA.

232
00:14:31,440 --> 00:14:34,160
But when you're now talking a couple of days ago,

233
00:14:34,160 --> 00:14:38,040
you also mentioned another example with a hardware vendor.

234
00:14:38,040 --> 00:14:40,160
Do you want to just go through that scenario?

235
00:14:40,160 --> 00:14:42,280
Yeah, we have a vendor.

236
00:14:42,280 --> 00:14:43,720
The company is called Lenovo.

237
00:14:43,720 --> 00:14:45,760
Probably everybody knows Lenovo.

238
00:14:45,760 --> 00:14:47,960
So they're producing laptops.

239
00:14:47,960 --> 00:14:54,080
And they are one of our customers that are already using the ledger feature or their supply chain.

240
00:14:54,080 --> 00:14:57,400
So what they build is when they produce a laptop,

241
00:14:57,400 --> 00:15:00,760
they store the hardware information into a ledger table,

242
00:15:00,760 --> 00:15:06,560
like for example, the number of CPUs, the number of memory that is in that particular laptop, and so on.

243
00:15:06,560 --> 00:15:10,480
So all these specifications are stored in there.

244
00:15:10,480 --> 00:15:17,280
And once we, let's say, Michael, you order the laptop and we send the laptop to you

245
00:15:17,280 --> 00:15:21,600
and you receive the laptop, you open it up and then you need to register your laptop.

246
00:15:21,600 --> 00:15:25,760
And basically what we're going to do is we're going to verify that

247
00:15:25,760 --> 00:15:30,000
the laptop that you've received is really what you have ordered.

248
00:15:30,000 --> 00:15:33,720
And to prove that is we're going to check based on the,

249
00:15:33,720 --> 00:15:37,280
or Lenovo is going to check based on the serial number of your laptop,

250
00:15:37,280 --> 00:15:39,880
and it's going to verify that into the ledger table.

251
00:15:39,880 --> 00:15:45,200
And if that is correct, so the number of CPUs, the number of memory, the type and so on.

252
00:15:45,200 --> 00:15:50,840
If that is valid, we know that you have received the correct laptop that you've ordered

253
00:15:50,840 --> 00:15:54,560
because we know that the data in the ledger table can be trusted.

254
00:15:54,560 --> 00:16:02,160
And this is a real life example because sometimes when the laptop is delivered by a third party company or by a vendor,

255
00:16:02,160 --> 00:16:08,600
apparently it happened from time to time that instead of, let's say, if you have eight CPUs, you just get like four.

256
00:16:08,600 --> 00:16:11,640
And nobody would have ever known what has happened.

257
00:16:11,640 --> 00:16:14,840
But by using or storing that information in the ledger table,

258
00:16:14,840 --> 00:16:18,000
we're sure that if these values don't match,

259
00:16:18,000 --> 00:16:22,120
that the end user or the end customer didn't get the correct laptop.

260
00:16:22,120 --> 00:16:26,520
And that's a really good example of how you can use ledger.

261
00:16:26,520 --> 00:16:31,480
So an interesting point here is, OK, so we've got this series of hashes.

262
00:16:31,480 --> 00:16:36,120
I can verify that the data has not been tampered with by verifying the hashes.

263
00:16:36,120 --> 00:16:40,480
But let's say there has been a tampering incident and the data has been tampered with.

264
00:16:40,480 --> 00:16:42,200
How can we recover from that?

265
00:16:42,200 --> 00:16:44,280
There are two types of tampering.

266
00:16:44,280 --> 00:16:46,400
First, let me go back a little bit.

267
00:16:46,400 --> 00:16:50,200
So you gave the example of the malicious DBA, right?

268
00:16:50,200 --> 00:16:56,640
If he or she just tries to modify a record, that will be stored in the history table, right?

269
00:16:56,640 --> 00:16:58,040
That modification.

270
00:16:58,040 --> 00:17:02,320
But if we should run the verification, that would succeed, right?

271
00:17:02,320 --> 00:17:08,160
Because there has nothing really changed under the hood or there has nothing really changed

272
00:17:08,160 --> 00:17:10,600
in the database digest itself.

273
00:17:10,600 --> 00:17:12,280
So that's one way of tampering it.

274
00:17:12,280 --> 00:17:16,160
And to get to figure that out, you just need to go into the history table

275
00:17:16,160 --> 00:17:19,040
and try to find out what exactly has happened.

276
00:17:19,040 --> 00:17:26,320
There is another possibility, of course, is that the DBA, if they have SQL skills

277
00:17:26,320 --> 00:17:31,640
and they know how to tamper, for example, immediately in the data file on disk

278
00:17:31,640 --> 00:17:37,080
or if we look at our platform, if we should have a cloud operator

279
00:17:37,080 --> 00:17:41,520
that is able to tamper with the data inside the data files itself,

280
00:17:41,520 --> 00:17:45,600
then, yeah, these changes would never appear in the history table.

281
00:17:45,600 --> 00:17:52,000
But these changes or that type of tampering would be captured by the verification, right?

282
00:17:52,000 --> 00:17:57,240
Because then the hashes wouldn't match and then the verification would fail.

283
00:17:57,240 --> 00:18:02,360
Now, okay, let's say that you figured out that somebody actually has tampered with your data,

284
00:18:02,360 --> 00:18:04,680
so the verification fails.

285
00:18:04,680 --> 00:18:05,600
Indeed, a good question.

286
00:18:05,600 --> 00:18:07,800
How can you recover from this tampering?

287
00:18:07,800 --> 00:18:10,680
And you should look at two different scenarios.

288
00:18:10,680 --> 00:18:16,040
You should look at it like, okay, the tampering, does it affect the data

289
00:18:16,040 --> 00:18:23,280
that is used in further transactions or the tampering didn't affect any further transactions?

290
00:18:23,280 --> 00:18:28,040
If you look at the first example, so it affected further transactions,

291
00:18:28,040 --> 00:18:32,280
the only option that you have is doing a point-in-time restore of your database

292
00:18:32,280 --> 00:18:34,640
to the latest backup that can be verified.

293
00:18:34,640 --> 00:18:42,560
So, yeah, you will need to try an error to see, okay, if I do a restore until, let's say, yesterday

294
00:18:42,560 --> 00:18:47,120
and then I run the verification on that restored database, the verification still fails,

295
00:18:47,120 --> 00:18:49,320
okay, then we still need to go back in time.

296
00:18:49,320 --> 00:18:54,240
So, it's a manual process doing the point-in-time restore.

297
00:18:54,240 --> 00:18:59,920
And then what you can do, you can manually repair the table, the ledger stable,

298
00:18:59,920 --> 00:19:03,680
by reinserting the information, that's one option.

299
00:19:03,680 --> 00:19:08,160
If you know what information that you need to insert, but that's a manual action.

300
00:19:08,160 --> 00:19:15,560
Or the other option is that you just accept that the transactions that occurred after the restore

301
00:19:15,560 --> 00:19:16,760
that they are lost.

302
00:19:16,760 --> 00:19:20,000
So, these are the two options that you have.

303
00:19:20,000 --> 00:19:26,160
Now, if we look at the other example where the tampering didn't affect the further transactions,

304
00:19:26,160 --> 00:19:30,920
what you can do is you can retrieve the correct table ledger from backup.

305
00:19:30,920 --> 00:19:35,520
So, what you need to do, you need to restore the database under an under name.

306
00:19:35,520 --> 00:19:40,720
Then you need to figure out, okay, what was the previous value of that particular record.

307
00:19:40,720 --> 00:19:49,400
And then you can just overwrite the tampered data with the original data that you found back in the backup files.

308
00:19:49,400 --> 00:19:56,040
With that, you will correct the data, it will again, store a value in the history table, but that's okay.

309
00:19:56,040 --> 00:20:02,760
After that, yeah, the tampering will go away and, well, not the tampering will go away,

310
00:20:02,760 --> 00:20:06,720
but you will have the original value back again in the ledger table.

311
00:20:06,720 --> 00:20:08,040
So, nothing comes free.

312
00:20:08,040 --> 00:20:10,120
So, what's the performance impact?

313
00:20:10,120 --> 00:20:14,360
My guess is it's probably some kind of impact on writes and updates.

314
00:20:14,360 --> 00:20:16,880
Probably not on reads, but I'll leave that to you.

315
00:20:16,880 --> 00:20:18,160
Yeah, that's right.

316
00:20:18,160 --> 00:20:19,680
Yeah, there is an impact.

317
00:20:19,680 --> 00:20:25,520
So, there is an overhead when you switch on or when you start using a ledger table.

318
00:20:25,520 --> 00:20:31,640
So, what we did, we did a bunch of tests with two different type of loads.

319
00:20:31,640 --> 00:20:38,280
So, one load that is just typically what we see, a random read and write access of an application.

320
00:20:38,280 --> 00:20:45,240
And then another load what we did was one where we did a lot of updates,

321
00:20:45,240 --> 00:20:48,440
only single updates on a particular ledger table.

322
00:20:48,440 --> 00:20:50,040
And there is indeed an impact.

323
00:20:50,040 --> 00:20:52,360
Yeah, there is an overhead.

324
00:20:52,360 --> 00:21:01,400
Like with the regular loads, updates and inserts, we lose around 6% of performance.

325
00:21:01,400 --> 00:21:05,520
That's the consequence of having a ledger enabled.

326
00:21:05,520 --> 00:21:10,680
The thing is that, of course, if we look at the DML latency here,

327
00:21:10,680 --> 00:21:19,120
for inserts, it's pretty much okay because we don't need to keep track of historical data for insert statements.

328
00:21:19,120 --> 00:21:22,160
The only thing we need to do is we need to compute the hash.

329
00:21:22,160 --> 00:21:23,640
So, that's pretty much okay.

330
00:21:23,640 --> 00:21:30,880
But if we look at delete operations, then we need to keep track of that in the history table.

331
00:21:30,880 --> 00:21:35,600
And on top of that, we also need to compute the hash there.

332
00:21:35,600 --> 00:21:39,520
So, that's some extra performance impact.

333
00:21:39,520 --> 00:21:45,600
Even for update operations, it's even more because then we need to add two records in the history table.

334
00:21:45,600 --> 00:21:50,960
So, one for the old version and then one for the new version.

335
00:21:50,960 --> 00:21:56,560
So, we basically do a delete and then an insert again in the history table.

336
00:21:56,560 --> 00:22:05,080
So, yeah, it has an impact, but to be honest, the customers that I've been working with for proof of concepts,

337
00:22:05,080 --> 00:22:08,400
they are happy with the performance of the ledger tapes.

338
00:22:08,400 --> 00:22:11,520
They don't have any complaints at all there.

339
00:22:11,520 --> 00:22:13,000
Yeah, I think it's important, right?

340
00:22:13,000 --> 00:22:20,600
Because when you're looking at the security benefit and the non-repudiation benefits that come with ledger,

341
00:22:20,600 --> 00:22:25,160
for certain environments, it completely outweighs the performance cost.

342
00:22:25,160 --> 00:22:30,960
When we know how to solve the performance cost problem, we throw some more hardware at the problem.

343
00:22:30,960 --> 00:22:32,800
But at the end of the day, there's a feature that you absolutely need.

344
00:22:32,800 --> 00:22:34,120
And I mentioned non-repudiation.

345
00:22:34,120 --> 00:22:40,680
That's incredibly important because there are some environments where you need strong cryptographic non-repudiation.

346
00:22:40,680 --> 00:22:45,720
I mean, a log file, for one's of a better word, with just strong RBAC policies,

347
00:22:45,720 --> 00:22:52,840
still has very weak non-repudiation value because it can still be manipulated by potentially a trusted user.

348
00:22:52,840 --> 00:23:01,560
Something that uses cryptographic mechanisms to protect it and with strong write-ones, read-many storage,

349
00:23:01,560 --> 00:23:05,640
carries a lot more weight when it comes to mitigating repudiation threats.

350
00:23:05,640 --> 00:23:10,840
And in fact, for those of you out there who are into threat modeling, you may be familiar with stride,

351
00:23:10,840 --> 00:23:16,320
which is spoofing, tampering, repudiation, information disclosure, denial of service, and elevation of privilege,

352
00:23:16,320 --> 00:23:19,680
well, the R in stride is repudiation.

353
00:23:19,680 --> 00:23:30,920
And this ledger technology is a fantastic arrow to having your quiver when it comes to mechanisms for mitigating repudiation threats.

354
00:23:30,920 --> 00:23:33,440
Historically, they're kind of pretty weak.

355
00:23:33,440 --> 00:23:39,080
A lot of technologies don't support it very well, but this is one that supports exceedingly well out of the box.

356
00:23:39,080 --> 00:23:42,280
So this is, I'm really, really excited actually about ledger.

357
00:23:42,280 --> 00:23:44,400
So this is great to see.

358
00:23:44,400 --> 00:23:51,200
So Peter, one thing we ask all our guests is if you have one final thought you'd like to leave our listeners with, what would it be?

359
00:23:51,200 --> 00:23:59,680
Well, if you want to have cryptographic proof that your data can be trusted and that it has not been tampered with,

360
00:23:59,680 --> 00:24:01,200
just start looking at ledger.

361
00:24:01,200 --> 00:24:06,880
It's easy to use and it's performance and it's just SQL.

362
00:24:06,880 --> 00:24:08,600
It's a feature in SQL server.

363
00:24:08,600 --> 00:24:12,840
So go ahead and play with it and let us know what you think of it.

364
00:24:12,840 --> 00:24:14,640
Okay, Peter, thank you so much for joining us this week.

365
00:24:14,640 --> 00:24:15,480
I really appreciate it.

366
00:24:15,480 --> 00:24:19,720
I know that with the launch of this, this is going to be pretty busy.

367
00:24:19,720 --> 00:24:23,120
So again, thank you very much for joining us and to all our listeners out there.

368
00:24:23,120 --> 00:24:25,040
We hope you found this really useful.

369
00:24:25,040 --> 00:24:28,880
Ledger is certainly a technology that I'm really excited about.

370
00:24:28,880 --> 00:24:46,240
You can find us on Twitter at azuresecpod.com.

