RIGHTBRAIN BLOG

Diving Deeper Into DeepSeek

Diving Deeper Into DeepSeek

Pete Tiarks (Rightbrain GC) considers how the legal risks of using DeepSeek stack up against the risks of using other providers

Author's Note: This is a cross-posting of post originally published at Carey Lening's excellent Privacat Insights. That post emerged out of a collaboration between Pete and Carey in response to two initial posts about the DeepSeek consumer ChatBot. Those posts provide a lot of the background context to this one.

The version on this site has been heavily edited, and has lost of some of the the nerdier legal detail and many of the jokes. If you prefer your regulatory insight to take more time and fewer prisoners, you might prefer the original. Also, consider subscribing to Carey's work more generally, which includes some of the best stuff coming out about the intersection of LLMs and data protection law, and takes very few prisoners indeed.

It’s nearly two weeks since the last post about DeepSeek, and the mania for it only seems to be getting wilder. If you have any sort of role in vendor onboarding that means you’re probably hearing a lot of this sort of question:

The business really, really wants to use the new hotness that is DeepSeek. Can we?

Personally, I have a very definite answer to this.

You can self-host DeepSeek models or distillations of them. If you're comfortable with using large language models generally, the may well be a good idea. If self-hosting seems like a technically heavy lift, a number of providers (including Rightbrain!) make it available as part of their wider service.

Using DeepSeek's API or ChatBot is almost certainly not a good idea, at least if you're dealing with personal data, sensitive data, or any data over which you wish to maintain confidentiality.

If the last two paragraphs seem entirely obvious to you, you can probably stop reading this post. If you think that conclusion needs some justification, read on…

DeepSeek Is Not Your Standard Business LLM Provider

If you’re a compliance person being asked to approve use of DeepSeek, there’s a fair chance that the request is coming from the engineering department. If that’s the case, they’ll most likely be wanting to use DeepSeek’s API. 

In addition to its ChatBot, DeepSeek makes available an API so that developers can easily build its service into their products and tools. This is a pretty standard thing for the large LLM providers to do, and developers are used to building with those tools. APIs will be made available subject to terms of use from the provider. So how much difference is there between DeepSeek API and the other big providers”?

Actually, quite a lot.

No Business Offering

The fundamental difference between DeepSeek and other big providers is that, whilst DeepSeek have API terms, they don’t have business terms. The line between API and business offerings can often seem pretty indistinct. Indeed, the big LLM providers tend to elide the two: OpenAI and Anthropic combine their business and API terms, and production use of Google’s models tends to fall under its Google Cloud Platform terms. A casual reader might look at DeepSeek’s “Open Platform Terms of Service” (let’s call them “platform terms”) and conclude that they were dealing with a similar set-up. The fact that the terms contemplate “enterprise developer” customers does nothing to allay that impression.

But it’s the wrong impression. Really. DeepSeek’s API is better thought of as a more nerdy outgrowth of its consumer app. The Platform Terms themselves are presented as a supplement to DeepSeek's consumer Terms of Use (consumer terms). Both the platform terms and the consumer terms apply, with the platform terms taking priority. This is very different to the offerings of the big US providers set out above, who typically have a very clear division between their consumer and business terms. 

Why does that matter? Broadly, because businesses tend to be able to care more about certain things than consumers, and are in a slightly better position to get them. Businesses are, on average, more likely to read the terms (or pay someone to do it for them), to try and negotiate the terms, and to understand their legal and technical nuances. That results in some fairly predictable differences between consumer terms and business ones. 

Output Liability

The question of liability for outputs is a pretty important one. The whole point of an LLM is to be able to produce outputs, and many of the current policy debates in AI turn concern these. Are they accurate? Do they reflect systematic bias? Are they obscene? Could they be used as part of a nihilistic scheme to destroy the world?

In a business context you would usually (and, I think intuitively) expect the provider to take some responsibility for the outputs. If a business were to use DeepSeek's product to make fully automated hiring decisions (just a hypothetical - please do not actually do this), and those decisions turned out to be discriminatory, the affected candidate might sue the business. The business might rightly turn round to DeepSeek and say "This is a problem with your tool, what are you going to do about it?" And they would expect DeepSeek's terms to contain language to the effect of "we'll take some responsibility for the problem, so long as you yourself weren't doing something stupid."[1] The debate would then be over exactly when and to what extent DeepSeek was liable.

As Carey pointed out, DeepSeek's terms do the opposite, making the customer liable to DeepSeek for any harm resulting from the output. This seems… unfair? Weird, even? A business may have some limited control over what its users are putting into an LLM (more than the provider, anyway). If it knew what the output from the provider was going to be, though, it wouldn’t need an LLM in the first place. Asking the business to be liable for something it can’t control seems a like a deeply unreasonable request to make, and yet DeepSeek does.

This is a particularly acute problem with IP infringement. If an LLM is producing obscenity, or ducking questions about Tiananmen Square, or being blatantly discriminatory, it’s at least possible to test for that. But the question of whether an LLM’s output infringes any of the copyrights in its training data is a much harder one, both legally and practically. Depending on your jurisdiction, there's just less legal certainty in that area.

Take coding as a concrete example of this. Software coding is one thing that LLMs are widely agreed to be actually pretty useful for, and the idea that businesses might use LLMs to help write code seems intuitively plausible, and seems to have been borne out empirically. But that LLM was probably trained on masses of open source code. If a business uses an LLM to help write code, might it end up liable for infringement? Or, have its codebase subject to the terms of an open source license? It’s really very hard to know until we have more clarity - either from caselaw or legislation.[2]

In the face of that sort of uncertainty, many businesses might opt to just wait and see how the legal questions turn out. Clearly this is not what you'd like if you're trying to sell LLMs. Sure enough, in response to concerns like this, the big providers started making guarantees around IP in late 2023. Anthropic, Google and OpenAI now all provide their business customers with IP indemnities. Here’s an example from OpenAI’s business terms:

We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right. 

So, if you’re a business getting sued because someone says your LLM-generated content is a copy of their work, you can expect the the big providers to take ownership of that problem for you. If you are a consumer, however, you’re on your own. Consumer terms tend not to contain IP indemnities and, at the risk of getting repetitive, DeepSeek’s terms are consumer terms. 

Model Training

It’s not just LLM output you need to worry about, though. It’s standard practice for the big US LLM providers to train their models on the content (inputs and outputs) processed by the consumer app. As with IP, confidentiality tends to be a pretty big concern of most businesses - the sort of thing that might keep them from using LLMs at all if there was too much uncertainty. Google’s language around the use of customer data submitted through Vertex AI is thus pretty typical of the sorts of reassurances US providers offer: “Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.”

DeepSeek’s terms certainly don’t contain anything that categorical. What uses of customer data can DeepSeek make? I have been scratching my head about this for some time and, honestly, I still have no idea. 

This is the section of the API terms dealing with content usage in its entirety:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

The relevant section of the consumer terms reads as follows:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

4.3 In order to fulfill the requirements stipulated by laws and regulations or provide the Services specified in these Terms, and under the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services. If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.

As you’ll have noticed, the terms are pretty much identical, except that the Customer terms have an additional paragraph allowing for DeepSeek to use customer data. But, remember, it’s all part of one big agreement. Those consumer terms also apply to API usage, unless the API terms contradict them. Does the fact that the API terms have reproduced the same language but omit the usage part mean that the API terms are contradicting the consumer terms? Does the fact these terms are governed by Chinese law affect the answer? My guess would be that the right to use data still stands but, frankly, I don’t have a clue. The only firm conclusion I can draw is that this is not a set of terms drafted to soothe the worries of twitchy compliance teams at large businesses.

Data Protection

US big tech, somewhat notoriously, has an uneasy relationship with EU data protection law. Put the name Google, Meta, Amazon or Microsoft into GDPRHub’s advanced search and you’ll find decisions dealing with those parties in respect of pretty much every GDPR article that a private actor can be accused of violating. In the context of businesses, though, I’m going to confine myself to Article 28 (Processor) and Chapter V (Data Transfers).  

For the uninitiated, controllers determine the "purposes and means of processing", and processors just follow the controller's instructions. Big tech companies provide long and detailed data processing agreements in which they, amongst other things like security guarantees, vendor management and audit rights, promise to only process customer data in accordance with the documented instructions of their customer. 

DeepSeek’s approach to data protection is, again, confusing. The data protection section of the terms is brief but, for me at least, kind of fascinating:

5.3 We will collect and process the personal information you provide as the data subject in accordance with the "DeepSeek Privacy Policy". However, when your end users access downstream systems, applications, or functions that you've developed based on the open platform, the processing rules for their collected personal information are not covered by this privacy policy. As the controller of personal information processing activities in that scenario, you should disclose the relevant privacy policy to your end users.

First off: are they a processor or not? They say that the customer is the data controller. And the processing of personal data contained in content submitted through the API isn’t subject to their privacy policy. So… are they a processor? If they are, why wouldn’t they say so?

The question is all the more frustrating because of the semi-fluent GDPR-speak. That’s presumably partly explained by the Chinese data protection framework which shares a lot of its approach with the GDPR. The Personal Information Protection Law of the People’s Republic of China, I learnt while writing this, defines “personal information processor” (PIP) as something directly analogous to “controller” (they even talk about “purposes and method”). It also has an (undefined) concept of “entrusted party of personal information” which is required to process personal information “as agreed [with the PIP] and shall not process personal information beyond the agreed purpose and method of processing.” So - a processor. Finally, the PIP is also required to “agree with the entrusted party on the purpose, duration, and method of entrusted processing, type and protection measures of personal information as well as the rights and obligations of both parties, and supervise the personal information processing activities of the entrusted party.” And, as with the GDPR and Data processing agreements, this agreement needs to be formalised in a “contract”.

While the above explains DeepSeek’s apparent familiarity with EU data protection concepts, it raises at least as many questions as it answers. Most importantly: where’s the contract? DeepSeek's terms don't make reference to one, and there's nothing on their site. It certainly sounds like DeepSeek is intending to be a processor - it seems like the logical implication of saying that the developer is the controller and DeepSeek isn’t. But the means of doing that is nowhere to be found. 

Maybe the answer here is that DeepSeek considers itself a joint controller (another European data protection concept mirrored in the PIPL)? This would, presumably, mean that DeepSeek was training its models on the API data, especially if my suspicions about controlling language between the Consumer and API terms in Section 4 are accurate. However, instead of assisting the developers or organisations using the data, DeepSeek offloads all the disclosure, handling of rights requests, and presumably, liability, onto them.  Clearly this would be worse from both the compliance and confidentiality points of view. But, mostly, it’s just really unclear what’s actually going on.

Obviously I’m not qualified to make any assessment of how well this works under Chinese law. But from the EU/UK standpoint, it’s fair to say that DeepSeek’s terms are pretty unsatisfactory.

Data Transfers

If you've worked in European tech for any length of time, you will be aware that, periodically, everyone gets worried about sending data to America for reasons that have something to do with US law and possibly Edward Snowden? The details of this have an only-partially-deserved reputation for being incredibly boring, and so are largely ignored by non-data protection specialists. But the nub of the problem is pretty straightforward. EU law gives people rights over data. But if you put that data on servers in a non-EU country, then that country's law applies, and it might not grant you those rights. That's not necessarily a problem in itself, because maybe the owner of the server can just commit to abide by EU law. What it can't do, however, is prevent the government of that country from just requesting your data from whoever is running the server, and then doing something awful with it. This is mostly a problem for America, because that is where many of the servers are. But, in principle, it's a potential problem for any non-EEA country.[3]

It will not have escaped your notice that DeepSeek is a Chinese company. Its primary market is in China, its servers are in China, and it is subject to Chinese law. You may also have heard that, in terms of government uses of data, China is not exactly universally acclaimed for it's human rights-first approach. You can see where this is heading. In theory, the data transfers to China story still has a lot of road left to run. The Irish regulator has been investigating TikTok’s transfers for the past God-knows-how-long, with no sign of a decision yet. Max Schrems (the guy who keeps demolishing the US data transfer efforts) has now got in on the action, and those cases will take a while to make it through the courts. But the European Data Protection Board (who, for these purposes, you can think of as the people who produce authoritative guesses about what the law means in the absence of anything better) has already published its thoughts on the possibility of lawful transfers to China. Those are long, detailed, and speculative in places, but I think they can fairly be summarised as follows:

It’s therefore hard to blame DeepSeek for not playing a GDPR compliance game that they have no realistic chance of winning. 

So the potential scope of government access is a compliance worry for businesses because it's going to be hard to say you're GDPR compliant if you're YOLO'ing customer data to China. But more fundamentally, though, there's a general confidentiality concern. Stories about industrial espionage by China have been a constant of the international business press for more than a decade. This has been heating up as the US has taken a more confrontational stance with China on trade issues, meaning that it’s probably going to get a lot worse before it gets better. Much of the reporting emphasises sophisticated measures, the increasing use of human intelligence and a whole load of other stuff designed to give CISO’s sleepless nights, but sending data into the country so that the CCP can just request it rather obviates those concerns.

What’s a Business to Do?

All of the above may make it sound like I have some axe to grind against DeepSeek, as well as a touchingly naive faith in US LLM providers. I really don’t. DeepSeek has one advantage over all the other providers covered, and it seems to me pretty decisive. You don’t actually have to trust their service in order to use their models: both R1 and V3 are available on open source terms. The most obvious US comparators are Meta’s Llama models, which are less powerful, less open, and subject to more restrictive license terms. 

Running DeepSeek’s models on more trusted hardware can solve most of the problems outlined above. It avoids the ‘sending confidential or sensitive data to China’ problem,  and means organisations are no longer subject to DeepSeek’s terms. They can also stop worrying about some of the other security problems DeepSeek may face. If hosting the model yourself sounds like too heavy a lift and you’d prefer a nice API, an increasing number of vendors (including rightBrain) are now providing that service. 

For a businesses trying to mitigate risk, I think there are clear advantages to having alternatives to the big US providers, and particular advantages to those alternatives being open source. In a future post, I'll try to provide some detail around that.


Footnotes

[1] Obviously I am paraphrasing.

[2] Again, this varies by jurisdiction. Japan is fine.

[3] Data protection purists will notice several over-simplifications here. The EU maintains a list of countries whose law it considers acceptable. Again, Japan is fine. The US is included on a complicated conditional basis which keeps being challenged and re-instated with increasingly cumbersome names. Also, I am aware that the question of server location is less important than the possibility of logical access. I don't think any of this affects the larger point I am making, though.


Author's Note: This is a cross-posting of post originally published at Carey Lening's excellent Privacat Insights. That post emerged out of a collaboration between Pete and Carey in response to two initial posts about the DeepSeek consumer ChatBot. Those posts provide a lot of the background context to this one.

The version on this site has been heavily edited, and has lost of some of the the nerdier legal detail and many of the jokes. If you prefer your regulatory insight to take more time and fewer prisoners, you might prefer the original. Also, consider subscribing to Carey's work more generally, which includes some of the best stuff coming out about the intersection of LLMs and data protection law, and takes very few prisoners indeed.

It’s nearly two weeks since the last post about DeepSeek, and the mania for it only seems to be getting wilder. If you have any sort of role in vendor onboarding that means you’re probably hearing a lot of this sort of question:

The business really, really wants to use the new hotness that is DeepSeek. Can we?

Personally, I have a very definite answer to this.

You can self-host DeepSeek models or distillations of them. If you're comfortable with using large language models generally, the may well be a good idea. If self-hosting seems like a technically heavy lift, a number of providers (including Rightbrain!) make it available as part of their wider service.

Using DeepSeek's API or ChatBot is almost certainly not a good idea, at least if you're dealing with personal data, sensitive data, or any data over which you wish to maintain confidentiality.

If the last two paragraphs seem entirely obvious to you, you can probably stop reading this post. If you think that conclusion needs some justification, read on…

DeepSeek Is Not Your Standard Business LLM Provider

If you’re a compliance person being asked to approve use of DeepSeek, there’s a fair chance that the request is coming from the engineering department. If that’s the case, they’ll most likely be wanting to use DeepSeek’s API. 

In addition to its ChatBot, DeepSeek makes available an API so that developers can easily build its service into their products and tools. This is a pretty standard thing for the large LLM providers to do, and developers are used to building with those tools. APIs will be made available subject to terms of use from the provider. So how much difference is there between DeepSeek API and the other big providers”?

Actually, quite a lot.

No Business Offering

The fundamental difference between DeepSeek and other big providers is that, whilst DeepSeek have API terms, they don’t have business terms. The line between API and business offerings can often seem pretty indistinct. Indeed, the big LLM providers tend to elide the two: OpenAI and Anthropic combine their business and API terms, and production use of Google’s models tends to fall under its Google Cloud Platform terms. A casual reader might look at DeepSeek’s “Open Platform Terms of Service” (let’s call them “platform terms”) and conclude that they were dealing with a similar set-up. The fact that the terms contemplate “enterprise developer” customers does nothing to allay that impression.

But it’s the wrong impression. Really. DeepSeek’s API is better thought of as a more nerdy outgrowth of its consumer app. The Platform Terms themselves are presented as a supplement to DeepSeek's consumer Terms of Use (consumer terms). Both the platform terms and the consumer terms apply, with the platform terms taking priority. This is very different to the offerings of the big US providers set out above, who typically have a very clear division between their consumer and business terms. 

Why does that matter? Broadly, because businesses tend to be able to care more about certain things than consumers, and are in a slightly better position to get them. Businesses are, on average, more likely to read the terms (or pay someone to do it for them), to try and negotiate the terms, and to understand their legal and technical nuances. That results in some fairly predictable differences between consumer terms and business ones. 

Output Liability

The question of liability for outputs is a pretty important one. The whole point of an LLM is to be able to produce outputs, and many of the current policy debates in AI turn concern these. Are they accurate? Do they reflect systematic bias? Are they obscene? Could they be used as part of a nihilistic scheme to destroy the world?

In a business context you would usually (and, I think intuitively) expect the provider to take some responsibility for the outputs. If a business were to use DeepSeek's product to make fully automated hiring decisions (just a hypothetical - please do not actually do this), and those decisions turned out to be discriminatory, the affected candidate might sue the business. The business might rightly turn round to DeepSeek and say "This is a problem with your tool, what are you going to do about it?" And they would expect DeepSeek's terms to contain language to the effect of "we'll take some responsibility for the problem, so long as you yourself weren't doing something stupid."[1] The debate would then be over exactly when and to what extent DeepSeek was liable.

As Carey pointed out, DeepSeek's terms do the opposite, making the customer liable to DeepSeek for any harm resulting from the output. This seems… unfair? Weird, even? A business may have some limited control over what its users are putting into an LLM (more than the provider, anyway). If it knew what the output from the provider was going to be, though, it wouldn’t need an LLM in the first place. Asking the business to be liable for something it can’t control seems a like a deeply unreasonable request to make, and yet DeepSeek does.

This is a particularly acute problem with IP infringement. If an LLM is producing obscenity, or ducking questions about Tiananmen Square, or being blatantly discriminatory, it’s at least possible to test for that. But the question of whether an LLM’s output infringes any of the copyrights in its training data is a much harder one, both legally and practically. Depending on your jurisdiction, there's just less legal certainty in that area.

Take coding as a concrete example of this. Software coding is one thing that LLMs are widely agreed to be actually pretty useful for, and the idea that businesses might use LLMs to help write code seems intuitively plausible, and seems to have been borne out empirically. But that LLM was probably trained on masses of open source code. If a business uses an LLM to help write code, might it end up liable for infringement? Or, have its codebase subject to the terms of an open source license? It’s really very hard to know until we have more clarity - either from caselaw or legislation.[2]

In the face of that sort of uncertainty, many businesses might opt to just wait and see how the legal questions turn out. Clearly this is not what you'd like if you're trying to sell LLMs. Sure enough, in response to concerns like this, the big providers started making guarantees around IP in late 2023. Anthropic, Google and OpenAI now all provide their business customers with IP indemnities. Here’s an example from OpenAI’s business terms:

We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right. 

So, if you’re a business getting sued because someone says your LLM-generated content is a copy of their work, you can expect the the big providers to take ownership of that problem for you. If you are a consumer, however, you’re on your own. Consumer terms tend not to contain IP indemnities and, at the risk of getting repetitive, DeepSeek’s terms are consumer terms. 

Model Training

It’s not just LLM output you need to worry about, though. It’s standard practice for the big US LLM providers to train their models on the content (inputs and outputs) processed by the consumer app. As with IP, confidentiality tends to be a pretty big concern of most businesses - the sort of thing that might keep them from using LLMs at all if there was too much uncertainty. Google’s language around the use of customer data submitted through Vertex AI is thus pretty typical of the sorts of reassurances US providers offer: “Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.”

DeepSeek’s terms certainly don’t contain anything that categorical. What uses of customer data can DeepSeek make? I have been scratching my head about this for some time and, honestly, I still have no idea. 

This is the section of the API terms dealing with content usage in its entirety:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

The relevant section of the consumer terms reads as follows:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

4.3 In order to fulfill the requirements stipulated by laws and regulations or provide the Services specified in these Terms, and under the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services. If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.

As you’ll have noticed, the terms are pretty much identical, except that the Customer terms have an additional paragraph allowing for DeepSeek to use customer data. But, remember, it’s all part of one big agreement. Those consumer terms also apply to API usage, unless the API terms contradict them. Does the fact that the API terms have reproduced the same language but omit the usage part mean that the API terms are contradicting the consumer terms? Does the fact these terms are governed by Chinese law affect the answer? My guess would be that the right to use data still stands but, frankly, I don’t have a clue. The only firm conclusion I can draw is that this is not a set of terms drafted to soothe the worries of twitchy compliance teams at large businesses.

Data Protection

US big tech, somewhat notoriously, has an uneasy relationship with EU data protection law. Put the name Google, Meta, Amazon or Microsoft into GDPRHub’s advanced search and you’ll find decisions dealing with those parties in respect of pretty much every GDPR article that a private actor can be accused of violating. In the context of businesses, though, I’m going to confine myself to Article 28 (Processor) and Chapter V (Data Transfers).  

For the uninitiated, controllers determine the "purposes and means of processing", and processors just follow the controller's instructions. Big tech companies provide long and detailed data processing agreements in which they, amongst other things like security guarantees, vendor management and audit rights, promise to only process customer data in accordance with the documented instructions of their customer. 

DeepSeek’s approach to data protection is, again, confusing. The data protection section of the terms is brief but, for me at least, kind of fascinating:

5.3 We will collect and process the personal information you provide as the data subject in accordance with the "DeepSeek Privacy Policy". However, when your end users access downstream systems, applications, or functions that you've developed based on the open platform, the processing rules for their collected personal information are not covered by this privacy policy. As the controller of personal information processing activities in that scenario, you should disclose the relevant privacy policy to your end users.

First off: are they a processor or not? They say that the customer is the data controller. And the processing of personal data contained in content submitted through the API isn’t subject to their privacy policy. So… are they a processor? If they are, why wouldn’t they say so?

The question is all the more frustrating because of the semi-fluent GDPR-speak. That’s presumably partly explained by the Chinese data protection framework which shares a lot of its approach with the GDPR. The Personal Information Protection Law of the People’s Republic of China, I learnt while writing this, defines “personal information processor” (PIP) as something directly analogous to “controller” (they even talk about “purposes and method”). It also has an (undefined) concept of “entrusted party of personal information” which is required to process personal information “as agreed [with the PIP] and shall not process personal information beyond the agreed purpose and method of processing.” So - a processor. Finally, the PIP is also required to “agree with the entrusted party on the purpose, duration, and method of entrusted processing, type and protection measures of personal information as well as the rights and obligations of both parties, and supervise the personal information processing activities of the entrusted party.” And, as with the GDPR and Data processing agreements, this agreement needs to be formalised in a “contract”.

While the above explains DeepSeek’s apparent familiarity with EU data protection concepts, it raises at least as many questions as it answers. Most importantly: where’s the contract? DeepSeek's terms don't make reference to one, and there's nothing on their site. It certainly sounds like DeepSeek is intending to be a processor - it seems like the logical implication of saying that the developer is the controller and DeepSeek isn’t. But the means of doing that is nowhere to be found. 

Maybe the answer here is that DeepSeek considers itself a joint controller (another European data protection concept mirrored in the PIPL)? This would, presumably, mean that DeepSeek was training its models on the API data, especially if my suspicions about controlling language between the Consumer and API terms in Section 4 are accurate. However, instead of assisting the developers or organisations using the data, DeepSeek offloads all the disclosure, handling of rights requests, and presumably, liability, onto them.  Clearly this would be worse from both the compliance and confidentiality points of view. But, mostly, it’s just really unclear what’s actually going on.

Obviously I’m not qualified to make any assessment of how well this works under Chinese law. But from the EU/UK standpoint, it’s fair to say that DeepSeek’s terms are pretty unsatisfactory.

Data Transfers

If you've worked in European tech for any length of time, you will be aware that, periodically, everyone gets worried about sending data to America for reasons that have something to do with US law and possibly Edward Snowden? The details of this have an only-partially-deserved reputation for being incredibly boring, and so are largely ignored by non-data protection specialists. But the nub of the problem is pretty straightforward. EU law gives people rights over data. But if you put that data on servers in a non-EU country, then that country's law applies, and it might not grant you those rights. That's not necessarily a problem in itself, because maybe the owner of the server can just commit to abide by EU law. What it can't do, however, is prevent the government of that country from just requesting your data from whoever is running the server, and then doing something awful with it. This is mostly a problem for America, because that is where many of the servers are. But, in principle, it's a potential problem for any non-EEA country.[3]

It will not have escaped your notice that DeepSeek is a Chinese company. Its primary market is in China, its servers are in China, and it is subject to Chinese law. You may also have heard that, in terms of government uses of data, China is not exactly universally acclaimed for it's human rights-first approach. You can see where this is heading. In theory, the data transfers to China story still has a lot of road left to run. The Irish regulator has been investigating TikTok’s transfers for the past God-knows-how-long, with no sign of a decision yet. Max Schrems (the guy who keeps demolishing the US data transfer efforts) has now got in on the action, and those cases will take a while to make it through the courts. But the European Data Protection Board (who, for these purposes, you can think of as the people who produce authoritative guesses about what the law means in the absence of anything better) has already published its thoughts on the possibility of lawful transfers to China. Those are long, detailed, and speculative in places, but I think they can fairly be summarised as follows:

It’s therefore hard to blame DeepSeek for not playing a GDPR compliance game that they have no realistic chance of winning. 

So the potential scope of government access is a compliance worry for businesses because it's going to be hard to say you're GDPR compliant if you're YOLO'ing customer data to China. But more fundamentally, though, there's a general confidentiality concern. Stories about industrial espionage by China have been a constant of the international business press for more than a decade. This has been heating up as the US has taken a more confrontational stance with China on trade issues, meaning that it’s probably going to get a lot worse before it gets better. Much of the reporting emphasises sophisticated measures, the increasing use of human intelligence and a whole load of other stuff designed to give CISO’s sleepless nights, but sending data into the country so that the CCP can just request it rather obviates those concerns.

What’s a Business to Do?

All of the above may make it sound like I have some axe to grind against DeepSeek, as well as a touchingly naive faith in US LLM providers. I really don’t. DeepSeek has one advantage over all the other providers covered, and it seems to me pretty decisive. You don’t actually have to trust their service in order to use their models: both R1 and V3 are available on open source terms. The most obvious US comparators are Meta’s Llama models, which are less powerful, less open, and subject to more restrictive license terms. 

Running DeepSeek’s models on more trusted hardware can solve most of the problems outlined above. It avoids the ‘sending confidential or sensitive data to China’ problem,  and means organisations are no longer subject to DeepSeek’s terms. They can also stop worrying about some of the other security problems DeepSeek may face. If hosting the model yourself sounds like too heavy a lift and you’d prefer a nice API, an increasing number of vendors (including rightBrain) are now providing that service. 

For a businesses trying to mitigate risk, I think there are clear advantages to having alternatives to the big US providers, and particular advantages to those alternatives being open source. In a future post, I'll try to provide some detail around that.


Footnotes

[1] Obviously I am paraphrasing.

[2] Again, this varies by jurisdiction. Japan is fine.

[3] Data protection purists will notice several over-simplifications here. The EU maintains a list of countries whose law it considers acceptable. Again, Japan is fine. The US is included on a complicated conditional basis which keeps being challenged and re-instated with increasingly cumbersome names. Also, I am aware that the question of server location is less important than the possibility of logical access. I don't think any of this affects the larger point I am making, though.


Author's Note: This is a cross-posting of post originally published at Carey Lening's excellent Privacat Insights. That post emerged out of a collaboration between Pete and Carey in response to two initial posts about the DeepSeek consumer ChatBot. Those posts provide a lot of the background context to this one.

The version on this site has been heavily edited, and has lost of some of the the nerdier legal detail and many of the jokes. If you prefer your regulatory insight to take more time and fewer prisoners, you might prefer the original. Also, consider subscribing to Carey's work more generally, which includes some of the best stuff coming out about the intersection of LLMs and data protection law, and takes very few prisoners indeed.

It’s nearly two weeks since the last post about DeepSeek, and the mania for it only seems to be getting wilder. If you have any sort of role in vendor onboarding that means you’re probably hearing a lot of this sort of question:

The business really, really wants to use the new hotness that is DeepSeek. Can we?

Personally, I have a very definite answer to this.

You can self-host DeepSeek models or distillations of them. If you're comfortable with using large language models generally, the may well be a good idea. If self-hosting seems like a technically heavy lift, a number of providers (including Rightbrain!) make it available as part of their wider service.

Using DeepSeek's API or ChatBot is almost certainly not a good idea, at least if you're dealing with personal data, sensitive data, or any data over which you wish to maintain confidentiality.

If the last two paragraphs seem entirely obvious to you, you can probably stop reading this post. If you think that conclusion needs some justification, read on…

DeepSeek Is Not Your Standard Business LLM Provider

If you’re a compliance person being asked to approve use of DeepSeek, there’s a fair chance that the request is coming from the engineering department. If that’s the case, they’ll most likely be wanting to use DeepSeek’s API. 

In addition to its ChatBot, DeepSeek makes available an API so that developers can easily build its service into their products and tools. This is a pretty standard thing for the large LLM providers to do, and developers are used to building with those tools. APIs will be made available subject to terms of use from the provider. So how much difference is there between DeepSeek API and the other big providers”?

Actually, quite a lot.

No Business Offering

The fundamental difference between DeepSeek and other big providers is that, whilst DeepSeek have API terms, they don’t have business terms. The line between API and business offerings can often seem pretty indistinct. Indeed, the big LLM providers tend to elide the two: OpenAI and Anthropic combine their business and API terms, and production use of Google’s models tends to fall under its Google Cloud Platform terms. A casual reader might look at DeepSeek’s “Open Platform Terms of Service” (let’s call them “platform terms”) and conclude that they were dealing with a similar set-up. The fact that the terms contemplate “enterprise developer” customers does nothing to allay that impression.

But it’s the wrong impression. Really. DeepSeek’s API is better thought of as a more nerdy outgrowth of its consumer app. The Platform Terms themselves are presented as a supplement to DeepSeek's consumer Terms of Use (consumer terms). Both the platform terms and the consumer terms apply, with the platform terms taking priority. This is very different to the offerings of the big US providers set out above, who typically have a very clear division between their consumer and business terms. 

Why does that matter? Broadly, because businesses tend to be able to care more about certain things than consumers, and are in a slightly better position to get them. Businesses are, on average, more likely to read the terms (or pay someone to do it for them), to try and negotiate the terms, and to understand their legal and technical nuances. That results in some fairly predictable differences between consumer terms and business ones. 

Output Liability

The question of liability for outputs is a pretty important one. The whole point of an LLM is to be able to produce outputs, and many of the current policy debates in AI turn concern these. Are they accurate? Do they reflect systematic bias? Are they obscene? Could they be used as part of a nihilistic scheme to destroy the world?

In a business context you would usually (and, I think intuitively) expect the provider to take some responsibility for the outputs. If a business were to use DeepSeek's product to make fully automated hiring decisions (just a hypothetical - please do not actually do this), and those decisions turned out to be discriminatory, the affected candidate might sue the business. The business might rightly turn round to DeepSeek and say "This is a problem with your tool, what are you going to do about it?" And they would expect DeepSeek's terms to contain language to the effect of "we'll take some responsibility for the problem, so long as you yourself weren't doing something stupid."[1] The debate would then be over exactly when and to what extent DeepSeek was liable.

As Carey pointed out, DeepSeek's terms do the opposite, making the customer liable to DeepSeek for any harm resulting from the output. This seems… unfair? Weird, even? A business may have some limited control over what its users are putting into an LLM (more than the provider, anyway). If it knew what the output from the provider was going to be, though, it wouldn’t need an LLM in the first place. Asking the business to be liable for something it can’t control seems a like a deeply unreasonable request to make, and yet DeepSeek does.

This is a particularly acute problem with IP infringement. If an LLM is producing obscenity, or ducking questions about Tiananmen Square, or being blatantly discriminatory, it’s at least possible to test for that. But the question of whether an LLM’s output infringes any of the copyrights in its training data is a much harder one, both legally and practically. Depending on your jurisdiction, there's just less legal certainty in that area.

Take coding as a concrete example of this. Software coding is one thing that LLMs are widely agreed to be actually pretty useful for, and the idea that businesses might use LLMs to help write code seems intuitively plausible, and seems to have been borne out empirically. But that LLM was probably trained on masses of open source code. If a business uses an LLM to help write code, might it end up liable for infringement? Or, have its codebase subject to the terms of an open source license? It’s really very hard to know until we have more clarity - either from caselaw or legislation.[2]

In the face of that sort of uncertainty, many businesses might opt to just wait and see how the legal questions turn out. Clearly this is not what you'd like if you're trying to sell LLMs. Sure enough, in response to concerns like this, the big providers started making guarantees around IP in late 2023. Anthropic, Google and OpenAI now all provide their business customers with IP indemnities. Here’s an example from OpenAI’s business terms:

We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right. 

So, if you’re a business getting sued because someone says your LLM-generated content is a copy of their work, you can expect the the big providers to take ownership of that problem for you. If you are a consumer, however, you’re on your own. Consumer terms tend not to contain IP indemnities and, at the risk of getting repetitive, DeepSeek’s terms are consumer terms. 

Model Training

It’s not just LLM output you need to worry about, though. It’s standard practice for the big US LLM providers to train their models on the content (inputs and outputs) processed by the consumer app. As with IP, confidentiality tends to be a pretty big concern of most businesses - the sort of thing that might keep them from using LLMs at all if there was too much uncertainty. Google’s language around the use of customer data submitted through Vertex AI is thus pretty typical of the sorts of reassurances US providers offer: “Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.”

DeepSeek’s terms certainly don’t contain anything that categorical. What uses of customer data can DeepSeek make? I have been scratching my head about this for some time and, honestly, I still have no idea. 

This is the section of the API terms dealing with content usage in its entirety:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

The relevant section of the consumer terms reads as follows:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

4.3 In order to fulfill the requirements stipulated by laws and regulations or provide the Services specified in these Terms, and under the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services. If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.

As you’ll have noticed, the terms are pretty much identical, except that the Customer terms have an additional paragraph allowing for DeepSeek to use customer data. But, remember, it’s all part of one big agreement. Those consumer terms also apply to API usage, unless the API terms contradict them. Does the fact that the API terms have reproduced the same language but omit the usage part mean that the API terms are contradicting the consumer terms? Does the fact these terms are governed by Chinese law affect the answer? My guess would be that the right to use data still stands but, frankly, I don’t have a clue. The only firm conclusion I can draw is that this is not a set of terms drafted to soothe the worries of twitchy compliance teams at large businesses.

Data Protection

US big tech, somewhat notoriously, has an uneasy relationship with EU data protection law. Put the name Google, Meta, Amazon or Microsoft into GDPRHub’s advanced search and you’ll find decisions dealing with those parties in respect of pretty much every GDPR article that a private actor can be accused of violating. In the context of businesses, though, I’m going to confine myself to Article 28 (Processor) and Chapter V (Data Transfers).  

For the uninitiated, controllers determine the "purposes and means of processing", and processors just follow the controller's instructions. Big tech companies provide long and detailed data processing agreements in which they, amongst other things like security guarantees, vendor management and audit rights, promise to only process customer data in accordance with the documented instructions of their customer. 

DeepSeek’s approach to data protection is, again, confusing. The data protection section of the terms is brief but, for me at least, kind of fascinating:

5.3 We will collect and process the personal information you provide as the data subject in accordance with the "DeepSeek Privacy Policy". However, when your end users access downstream systems, applications, or functions that you've developed based on the open platform, the processing rules for their collected personal information are not covered by this privacy policy. As the controller of personal information processing activities in that scenario, you should disclose the relevant privacy policy to your end users.

First off: are they a processor or not? They say that the customer is the data controller. And the processing of personal data contained in content submitted through the API isn’t subject to their privacy policy. So… are they a processor? If they are, why wouldn’t they say so?

The question is all the more frustrating because of the semi-fluent GDPR-speak. That’s presumably partly explained by the Chinese data protection framework which shares a lot of its approach with the GDPR. The Personal Information Protection Law of the People’s Republic of China, I learnt while writing this, defines “personal information processor” (PIP) as something directly analogous to “controller” (they even talk about “purposes and method”). It also has an (undefined) concept of “entrusted party of personal information” which is required to process personal information “as agreed [with the PIP] and shall not process personal information beyond the agreed purpose and method of processing.” So - a processor. Finally, the PIP is also required to “agree with the entrusted party on the purpose, duration, and method of entrusted processing, type and protection measures of personal information as well as the rights and obligations of both parties, and supervise the personal information processing activities of the entrusted party.” And, as with the GDPR and Data processing agreements, this agreement needs to be formalised in a “contract”.

While the above explains DeepSeek’s apparent familiarity with EU data protection concepts, it raises at least as many questions as it answers. Most importantly: where’s the contract? DeepSeek's terms don't make reference to one, and there's nothing on their site. It certainly sounds like DeepSeek is intending to be a processor - it seems like the logical implication of saying that the developer is the controller and DeepSeek isn’t. But the means of doing that is nowhere to be found. 

Maybe the answer here is that DeepSeek considers itself a joint controller (another European data protection concept mirrored in the PIPL)? This would, presumably, mean that DeepSeek was training its models on the API data, especially if my suspicions about controlling language between the Consumer and API terms in Section 4 are accurate. However, instead of assisting the developers or organisations using the data, DeepSeek offloads all the disclosure, handling of rights requests, and presumably, liability, onto them.  Clearly this would be worse from both the compliance and confidentiality points of view. But, mostly, it’s just really unclear what’s actually going on.

Obviously I’m not qualified to make any assessment of how well this works under Chinese law. But from the EU/UK standpoint, it’s fair to say that DeepSeek’s terms are pretty unsatisfactory.

Data Transfers

If you've worked in European tech for any length of time, you will be aware that, periodically, everyone gets worried about sending data to America for reasons that have something to do with US law and possibly Edward Snowden? The details of this have an only-partially-deserved reputation for being incredibly boring, and so are largely ignored by non-data protection specialists. But the nub of the problem is pretty straightforward. EU law gives people rights over data. But if you put that data on servers in a non-EU country, then that country's law applies, and it might not grant you those rights. That's not necessarily a problem in itself, because maybe the owner of the server can just commit to abide by EU law. What it can't do, however, is prevent the government of that country from just requesting your data from whoever is running the server, and then doing something awful with it. This is mostly a problem for America, because that is where many of the servers are. But, in principle, it's a potential problem for any non-EEA country.[3]

It will not have escaped your notice that DeepSeek is a Chinese company. Its primary market is in China, its servers are in China, and it is subject to Chinese law. You may also have heard that, in terms of government uses of data, China is not exactly universally acclaimed for it's human rights-first approach. You can see where this is heading. In theory, the data transfers to China story still has a lot of road left to run. The Irish regulator has been investigating TikTok’s transfers for the past God-knows-how-long, with no sign of a decision yet. Max Schrems (the guy who keeps demolishing the US data transfer efforts) has now got in on the action, and those cases will take a while to make it through the courts. But the European Data Protection Board (who, for these purposes, you can think of as the people who produce authoritative guesses about what the law means in the absence of anything better) has already published its thoughts on the possibility of lawful transfers to China. Those are long, detailed, and speculative in places, but I think they can fairly be summarised as follows:

It’s therefore hard to blame DeepSeek for not playing a GDPR compliance game that they have no realistic chance of winning. 

So the potential scope of government access is a compliance worry for businesses because it's going to be hard to say you're GDPR compliant if you're YOLO'ing customer data to China. But more fundamentally, though, there's a general confidentiality concern. Stories about industrial espionage by China have been a constant of the international business press for more than a decade. This has been heating up as the US has taken a more confrontational stance with China on trade issues, meaning that it’s probably going to get a lot worse before it gets better. Much of the reporting emphasises sophisticated measures, the increasing use of human intelligence and a whole load of other stuff designed to give CISO’s sleepless nights, but sending data into the country so that the CCP can just request it rather obviates those concerns.

What’s a Business to Do?

All of the above may make it sound like I have some axe to grind against DeepSeek, as well as a touchingly naive faith in US LLM providers. I really don’t. DeepSeek has one advantage over all the other providers covered, and it seems to me pretty decisive. You don’t actually have to trust their service in order to use their models: both R1 and V3 are available on open source terms. The most obvious US comparators are Meta’s Llama models, which are less powerful, less open, and subject to more restrictive license terms. 

Running DeepSeek’s models on more trusted hardware can solve most of the problems outlined above. It avoids the ‘sending confidential or sensitive data to China’ problem,  and means organisations are no longer subject to DeepSeek’s terms. They can also stop worrying about some of the other security problems DeepSeek may face. If hosting the model yourself sounds like too heavy a lift and you’d prefer a nice API, an increasing number of vendors (including rightBrain) are now providing that service. 

For a businesses trying to mitigate risk, I think there are clear advantages to having alternatives to the big US providers, and particular advantages to those alternatives being open source. In a future post, I'll try to provide some detail around that.


Footnotes

[1] Obviously I am paraphrasing.

[2] Again, this varies by jurisdiction. Japan is fine.

[3] Data protection purists will notice several over-simplifications here. The EU maintains a list of countries whose law it considers acceptable. Again, Japan is fine. The US is included on a complicated conditional basis which keeps being challenged and re-instated with increasingly cumbersome names. Also, I am aware that the question of server location is less important than the possibility of logical access. I don't think any of this affects the larger point I am making, though.


Author's Note: This is a cross-posting of post originally published at Carey Lening's excellent Privacat Insights. That post emerged out of a collaboration between Pete and Carey in response to two initial posts about the DeepSeek consumer ChatBot. Those posts provide a lot of the background context to this one.

The version on this site has been heavily edited, and has lost of some of the the nerdier legal detail and many of the jokes. If you prefer your regulatory insight to take more time and fewer prisoners, you might prefer the original. Also, consider subscribing to Carey's work more generally, which includes some of the best stuff coming out about the intersection of LLMs and data protection law, and takes very few prisoners indeed.

It’s nearly two weeks since the last post about DeepSeek, and the mania for it only seems to be getting wilder. If you have any sort of role in vendor onboarding that means you’re probably hearing a lot of this sort of question:

The business really, really wants to use the new hotness that is DeepSeek. Can we?

Personally, I have a very definite answer to this.

You can self-host DeepSeek models or distillations of them. If you're comfortable with using large language models generally, the may well be a good idea. If self-hosting seems like a technically heavy lift, a number of providers (including Rightbrain!) make it available as part of their wider service.

Using DeepSeek's API or ChatBot is almost certainly not a good idea, at least if you're dealing with personal data, sensitive data, or any data over which you wish to maintain confidentiality.

If the last two paragraphs seem entirely obvious to you, you can probably stop reading this post. If you think that conclusion needs some justification, read on…

DeepSeek Is Not Your Standard Business LLM Provider

If you’re a compliance person being asked to approve use of DeepSeek, there’s a fair chance that the request is coming from the engineering department. If that’s the case, they’ll most likely be wanting to use DeepSeek’s API. 

In addition to its ChatBot, DeepSeek makes available an API so that developers can easily build its service into their products and tools. This is a pretty standard thing for the large LLM providers to do, and developers are used to building with those tools. APIs will be made available subject to terms of use from the provider. So how much difference is there between DeepSeek API and the other big providers”?

Actually, quite a lot.

No Business Offering

The fundamental difference between DeepSeek and other big providers is that, whilst DeepSeek have API terms, they don’t have business terms. The line between API and business offerings can often seem pretty indistinct. Indeed, the big LLM providers tend to elide the two: OpenAI and Anthropic combine their business and API terms, and production use of Google’s models tends to fall under its Google Cloud Platform terms. A casual reader might look at DeepSeek’s “Open Platform Terms of Service” (let’s call them “platform terms”) and conclude that they were dealing with a similar set-up. The fact that the terms contemplate “enterprise developer” customers does nothing to allay that impression.

But it’s the wrong impression. Really. DeepSeek’s API is better thought of as a more nerdy outgrowth of its consumer app. The Platform Terms themselves are presented as a supplement to DeepSeek's consumer Terms of Use (consumer terms). Both the platform terms and the consumer terms apply, with the platform terms taking priority. This is very different to the offerings of the big US providers set out above, who typically have a very clear division between their consumer and business terms. 

Why does that matter? Broadly, because businesses tend to be able to care more about certain things than consumers, and are in a slightly better position to get them. Businesses are, on average, more likely to read the terms (or pay someone to do it for them), to try and negotiate the terms, and to understand their legal and technical nuances. That results in some fairly predictable differences between consumer terms and business ones. 

Output Liability

The question of liability for outputs is a pretty important one. The whole point of an LLM is to be able to produce outputs, and many of the current policy debates in AI turn concern these. Are they accurate? Do they reflect systematic bias? Are they obscene? Could they be used as part of a nihilistic scheme to destroy the world?

In a business context you would usually (and, I think intuitively) expect the provider to take some responsibility for the outputs. If a business were to use DeepSeek's product to make fully automated hiring decisions (just a hypothetical - please do not actually do this), and those decisions turned out to be discriminatory, the affected candidate might sue the business. The business might rightly turn round to DeepSeek and say "This is a problem with your tool, what are you going to do about it?" And they would expect DeepSeek's terms to contain language to the effect of "we'll take some responsibility for the problem, so long as you yourself weren't doing something stupid."[1] The debate would then be over exactly when and to what extent DeepSeek was liable.

As Carey pointed out, DeepSeek's terms do the opposite, making the customer liable to DeepSeek for any harm resulting from the output. This seems… unfair? Weird, even? A business may have some limited control over what its users are putting into an LLM (more than the provider, anyway). If it knew what the output from the provider was going to be, though, it wouldn’t need an LLM in the first place. Asking the business to be liable for something it can’t control seems a like a deeply unreasonable request to make, and yet DeepSeek does.

This is a particularly acute problem with IP infringement. If an LLM is producing obscenity, or ducking questions about Tiananmen Square, or being blatantly discriminatory, it’s at least possible to test for that. But the question of whether an LLM’s output infringes any of the copyrights in its training data is a much harder one, both legally and practically. Depending on your jurisdiction, there's just less legal certainty in that area.

Take coding as a concrete example of this. Software coding is one thing that LLMs are widely agreed to be actually pretty useful for, and the idea that businesses might use LLMs to help write code seems intuitively plausible, and seems to have been borne out empirically. But that LLM was probably trained on masses of open source code. If a business uses an LLM to help write code, might it end up liable for infringement? Or, have its codebase subject to the terms of an open source license? It’s really very hard to know until we have more clarity - either from caselaw or legislation.[2]

In the face of that sort of uncertainty, many businesses might opt to just wait and see how the legal questions turn out. Clearly this is not what you'd like if you're trying to sell LLMs. Sure enough, in response to concerns like this, the big providers started making guarantees around IP in late 2023. Anthropic, Google and OpenAI now all provide their business customers with IP indemnities. Here’s an example from OpenAI’s business terms:

We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right. 

So, if you’re a business getting sued because someone says your LLM-generated content is a copy of their work, you can expect the the big providers to take ownership of that problem for you. If you are a consumer, however, you’re on your own. Consumer terms tend not to contain IP indemnities and, at the risk of getting repetitive, DeepSeek’s terms are consumer terms. 

Model Training

It’s not just LLM output you need to worry about, though. It’s standard practice for the big US LLM providers to train their models on the content (inputs and outputs) processed by the consumer app. As with IP, confidentiality tends to be a pretty big concern of most businesses - the sort of thing that might keep them from using LLMs at all if there was too much uncertainty. Google’s language around the use of customer data submitted through Vertex AI is thus pretty typical of the sorts of reassurances US providers offer: “Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.”

DeepSeek’s terms certainly don’t contain anything that categorical. What uses of customer data can DeepSeek make? I have been scratching my head about this for some time and, honestly, I still have no idea. 

This is the section of the API terms dealing with content usage in its entirety:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

The relevant section of the consumer terms reads as follows:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

4.3 In order to fulfill the requirements stipulated by laws and regulations or provide the Services specified in these Terms, and under the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services. If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.

As you’ll have noticed, the terms are pretty much identical, except that the Customer terms have an additional paragraph allowing for DeepSeek to use customer data. But, remember, it’s all part of one big agreement. Those consumer terms also apply to API usage, unless the API terms contradict them. Does the fact that the API terms have reproduced the same language but omit the usage part mean that the API terms are contradicting the consumer terms? Does the fact these terms are governed by Chinese law affect the answer? My guess would be that the right to use data still stands but, frankly, I don’t have a clue. The only firm conclusion I can draw is that this is not a set of terms drafted to soothe the worries of twitchy compliance teams at large businesses.

Data Protection

US big tech, somewhat notoriously, has an uneasy relationship with EU data protection law. Put the name Google, Meta, Amazon or Microsoft into GDPRHub’s advanced search and you’ll find decisions dealing with those parties in respect of pretty much every GDPR article that a private actor can be accused of violating. In the context of businesses, though, I’m going to confine myself to Article 28 (Processor) and Chapter V (Data Transfers).  

For the uninitiated, controllers determine the "purposes and means of processing", and processors just follow the controller's instructions. Big tech companies provide long and detailed data processing agreements in which they, amongst other things like security guarantees, vendor management and audit rights, promise to only process customer data in accordance with the documented instructions of their customer. 

DeepSeek’s approach to data protection is, again, confusing. The data protection section of the terms is brief but, for me at least, kind of fascinating:

5.3 We will collect and process the personal information you provide as the data subject in accordance with the "DeepSeek Privacy Policy". However, when your end users access downstream systems, applications, or functions that you've developed based on the open platform, the processing rules for their collected personal information are not covered by this privacy policy. As the controller of personal information processing activities in that scenario, you should disclose the relevant privacy policy to your end users.

First off: are they a processor or not? They say that the customer is the data controller. And the processing of personal data contained in content submitted through the API isn’t subject to their privacy policy. So… are they a processor? If they are, why wouldn’t they say so?

The question is all the more frustrating because of the semi-fluent GDPR-speak. That’s presumably partly explained by the Chinese data protection framework which shares a lot of its approach with the GDPR. The Personal Information Protection Law of the People’s Republic of China, I learnt while writing this, defines “personal information processor” (PIP) as something directly analogous to “controller” (they even talk about “purposes and method”). It also has an (undefined) concept of “entrusted party of personal information” which is required to process personal information “as agreed [with the PIP] and shall not process personal information beyond the agreed purpose and method of processing.” So - a processor. Finally, the PIP is also required to “agree with the entrusted party on the purpose, duration, and method of entrusted processing, type and protection measures of personal information as well as the rights and obligations of both parties, and supervise the personal information processing activities of the entrusted party.” And, as with the GDPR and Data processing agreements, this agreement needs to be formalised in a “contract”.

While the above explains DeepSeek’s apparent familiarity with EU data protection concepts, it raises at least as many questions as it answers. Most importantly: where’s the contract? DeepSeek's terms don't make reference to one, and there's nothing on their site. It certainly sounds like DeepSeek is intending to be a processor - it seems like the logical implication of saying that the developer is the controller and DeepSeek isn’t. But the means of doing that is nowhere to be found. 

Maybe the answer here is that DeepSeek considers itself a joint controller (another European data protection concept mirrored in the PIPL)? This would, presumably, mean that DeepSeek was training its models on the API data, especially if my suspicions about controlling language between the Consumer and API terms in Section 4 are accurate. However, instead of assisting the developers or organisations using the data, DeepSeek offloads all the disclosure, handling of rights requests, and presumably, liability, onto them.  Clearly this would be worse from both the compliance and confidentiality points of view. But, mostly, it’s just really unclear what’s actually going on.

Obviously I’m not qualified to make any assessment of how well this works under Chinese law. But from the EU/UK standpoint, it’s fair to say that DeepSeek’s terms are pretty unsatisfactory.

Data Transfers

If you've worked in European tech for any length of time, you will be aware that, periodically, everyone gets worried about sending data to America for reasons that have something to do with US law and possibly Edward Snowden? The details of this have an only-partially-deserved reputation for being incredibly boring, and so are largely ignored by non-data protection specialists. But the nub of the problem is pretty straightforward. EU law gives people rights over data. But if you put that data on servers in a non-EU country, then that country's law applies, and it might not grant you those rights. That's not necessarily a problem in itself, because maybe the owner of the server can just commit to abide by EU law. What it can't do, however, is prevent the government of that country from just requesting your data from whoever is running the server, and then doing something awful with it. This is mostly a problem for America, because that is where many of the servers are. But, in principle, it's a potential problem for any non-EEA country.[3]

It will not have escaped your notice that DeepSeek is a Chinese company. Its primary market is in China, its servers are in China, and it is subject to Chinese law. You may also have heard that, in terms of government uses of data, China is not exactly universally acclaimed for it's human rights-first approach. You can see where this is heading. In theory, the data transfers to China story still has a lot of road left to run. The Irish regulator has been investigating TikTok’s transfers for the past God-knows-how-long, with no sign of a decision yet. Max Schrems (the guy who keeps demolishing the US data transfer efforts) has now got in on the action, and those cases will take a while to make it through the courts. But the European Data Protection Board (who, for these purposes, you can think of as the people who produce authoritative guesses about what the law means in the absence of anything better) has already published its thoughts on the possibility of lawful transfers to China. Those are long, detailed, and speculative in places, but I think they can fairly be summarised as follows:

It’s therefore hard to blame DeepSeek for not playing a GDPR compliance game that they have no realistic chance of winning. 

So the potential scope of government access is a compliance worry for businesses because it's going to be hard to say you're GDPR compliant if you're YOLO'ing customer data to China. But more fundamentally, though, there's a general confidentiality concern. Stories about industrial espionage by China have been a constant of the international business press for more than a decade. This has been heating up as the US has taken a more confrontational stance with China on trade issues, meaning that it’s probably going to get a lot worse before it gets better. Much of the reporting emphasises sophisticated measures, the increasing use of human intelligence and a whole load of other stuff designed to give CISO’s sleepless nights, but sending data into the country so that the CCP can just request it rather obviates those concerns.

What’s a Business to Do?

All of the above may make it sound like I have some axe to grind against DeepSeek, as well as a touchingly naive faith in US LLM providers. I really don’t. DeepSeek has one advantage over all the other providers covered, and it seems to me pretty decisive. You don’t actually have to trust their service in order to use their models: both R1 and V3 are available on open source terms. The most obvious US comparators are Meta’s Llama models, which are less powerful, less open, and subject to more restrictive license terms. 

Running DeepSeek’s models on more trusted hardware can solve most of the problems outlined above. It avoids the ‘sending confidential or sensitive data to China’ problem,  and means organisations are no longer subject to DeepSeek’s terms. They can also stop worrying about some of the other security problems DeepSeek may face. If hosting the model yourself sounds like too heavy a lift and you’d prefer a nice API, an increasing number of vendors (including rightBrain) are now providing that service. 

For a businesses trying to mitigate risk, I think there are clear advantages to having alternatives to the big US providers, and particular advantages to those alternatives being open source. In a future post, I'll try to provide some detail around that.


Footnotes

[1] Obviously I am paraphrasing.

[2] Again, this varies by jurisdiction. Japan is fine.

[3] Data protection purists will notice several over-simplifications here. The EU maintains a list of countries whose law it considers acceptable. Again, Japan is fine. The US is included on a complicated conditional basis which keeps being challenged and re-instated with increasingly cumbersome names. Also, I am aware that the question of server location is less important than the possibility of logical access. I don't think any of this affects the larger point I am making, though.


Author's Note: This is a cross-posting of post originally published at Carey Lening's excellent Privacat Insights. That post emerged out of a collaboration between Pete and Carey in response to two initial posts about the DeepSeek consumer ChatBot. Those posts provide a lot of the background context to this one.

The version on this site has been heavily edited, and has lost of some of the the nerdier legal detail and many of the jokes. If you prefer your regulatory insight to take more time and fewer prisoners, you might prefer the original. Also, consider subscribing to Carey's work more generally, which includes some of the best stuff coming out about the intersection of LLMs and data protection law, and takes very few prisoners indeed.

It’s nearly two weeks since the last post about DeepSeek, and the mania for it only seems to be getting wilder. If you have any sort of role in vendor onboarding that means you’re probably hearing a lot of this sort of question:

The business really, really wants to use the new hotness that is DeepSeek. Can we?

Personally, I have a very definite answer to this.

You can self-host DeepSeek models or distillations of them. If you're comfortable with using large language models generally, the may well be a good idea. If self-hosting seems like a technically heavy lift, a number of providers (including Rightbrain!) make it available as part of their wider service.

Using DeepSeek's API or ChatBot is almost certainly not a good idea, at least if you're dealing with personal data, sensitive data, or any data over which you wish to maintain confidentiality.

If the last two paragraphs seem entirely obvious to you, you can probably stop reading this post. If you think that conclusion needs some justification, read on…

DeepSeek Is Not Your Standard Business LLM Provider

If you’re a compliance person being asked to approve use of DeepSeek, there’s a fair chance that the request is coming from the engineering department. If that’s the case, they’ll most likely be wanting to use DeepSeek’s API. 

In addition to its ChatBot, DeepSeek makes available an API so that developers can easily build its service into their products and tools. This is a pretty standard thing for the large LLM providers to do, and developers are used to building with those tools. APIs will be made available subject to terms of use from the provider. So how much difference is there between DeepSeek API and the other big providers”?

Actually, quite a lot.

No Business Offering

The fundamental difference between DeepSeek and other big providers is that, whilst DeepSeek have API terms, they don’t have business terms. The line between API and business offerings can often seem pretty indistinct. Indeed, the big LLM providers tend to elide the two: OpenAI and Anthropic combine their business and API terms, and production use of Google’s models tends to fall under its Google Cloud Platform terms. A casual reader might look at DeepSeek’s “Open Platform Terms of Service” (let’s call them “platform terms”) and conclude that they were dealing with a similar set-up. The fact that the terms contemplate “enterprise developer” customers does nothing to allay that impression.

But it’s the wrong impression. Really. DeepSeek’s API is better thought of as a more nerdy outgrowth of its consumer app. The Platform Terms themselves are presented as a supplement to DeepSeek's consumer Terms of Use (consumer terms). Both the platform terms and the consumer terms apply, with the platform terms taking priority. This is very different to the offerings of the big US providers set out above, who typically have a very clear division between their consumer and business terms. 

Why does that matter? Broadly, because businesses tend to be able to care more about certain things than consumers, and are in a slightly better position to get them. Businesses are, on average, more likely to read the terms (or pay someone to do it for them), to try and negotiate the terms, and to understand their legal and technical nuances. That results in some fairly predictable differences between consumer terms and business ones. 

Output Liability

The question of liability for outputs is a pretty important one. The whole point of an LLM is to be able to produce outputs, and many of the current policy debates in AI turn concern these. Are they accurate? Do they reflect systematic bias? Are they obscene? Could they be used as part of a nihilistic scheme to destroy the world?

In a business context you would usually (and, I think intuitively) expect the provider to take some responsibility for the outputs. If a business were to use DeepSeek's product to make fully automated hiring decisions (just a hypothetical - please do not actually do this), and those decisions turned out to be discriminatory, the affected candidate might sue the business. The business might rightly turn round to DeepSeek and say "This is a problem with your tool, what are you going to do about it?" And they would expect DeepSeek's terms to contain language to the effect of "we'll take some responsibility for the problem, so long as you yourself weren't doing something stupid."[1] The debate would then be over exactly when and to what extent DeepSeek was liable.

As Carey pointed out, DeepSeek's terms do the opposite, making the customer liable to DeepSeek for any harm resulting from the output. This seems… unfair? Weird, even? A business may have some limited control over what its users are putting into an LLM (more than the provider, anyway). If it knew what the output from the provider was going to be, though, it wouldn’t need an LLM in the first place. Asking the business to be liable for something it can’t control seems a like a deeply unreasonable request to make, and yet DeepSeek does.

This is a particularly acute problem with IP infringement. If an LLM is producing obscenity, or ducking questions about Tiananmen Square, or being blatantly discriminatory, it’s at least possible to test for that. But the question of whether an LLM’s output infringes any of the copyrights in its training data is a much harder one, both legally and practically. Depending on your jurisdiction, there's just less legal certainty in that area.

Take coding as a concrete example of this. Software coding is one thing that LLMs are widely agreed to be actually pretty useful for, and the idea that businesses might use LLMs to help write code seems intuitively plausible, and seems to have been borne out empirically. But that LLM was probably trained on masses of open source code. If a business uses an LLM to help write code, might it end up liable for infringement? Or, have its codebase subject to the terms of an open source license? It’s really very hard to know until we have more clarity - either from caselaw or legislation.[2]

In the face of that sort of uncertainty, many businesses might opt to just wait and see how the legal questions turn out. Clearly this is not what you'd like if you're trying to sell LLMs. Sure enough, in response to concerns like this, the big providers started making guarantees around IP in late 2023. Anthropic, Google and OpenAI now all provide their business customers with IP indemnities. Here’s an example from OpenAI’s business terms:

We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right. 

So, if you’re a business getting sued because someone says your LLM-generated content is a copy of their work, you can expect the the big providers to take ownership of that problem for you. If you are a consumer, however, you’re on your own. Consumer terms tend not to contain IP indemnities and, at the risk of getting repetitive, DeepSeek’s terms are consumer terms. 

Model Training

It’s not just LLM output you need to worry about, though. It’s standard practice for the big US LLM providers to train their models on the content (inputs and outputs) processed by the consumer app. As with IP, confidentiality tends to be a pretty big concern of most businesses - the sort of thing that might keep them from using LLMs at all if there was too much uncertainty. Google’s language around the use of customer data submitted through Vertex AI is thus pretty typical of the sorts of reassurances US providers offer: “Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.”

DeepSeek’s terms certainly don’t contain anything that categorical. What uses of customer data can DeepSeek make? I have been scratching my head about this for some time and, honestly, I still have no idea. 

This is the section of the API terms dealing with content usage in its entirety:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

The relevant section of the consumer terms reads as follows:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

4.3 In order to fulfill the requirements stipulated by laws and regulations or provide the Services specified in these Terms, and under the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services. If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.

As you’ll have noticed, the terms are pretty much identical, except that the Customer terms have an additional paragraph allowing for DeepSeek to use customer data. But, remember, it’s all part of one big agreement. Those consumer terms also apply to API usage, unless the API terms contradict them. Does the fact that the API terms have reproduced the same language but omit the usage part mean that the API terms are contradicting the consumer terms? Does the fact these terms are governed by Chinese law affect the answer? My guess would be that the right to use data still stands but, frankly, I don’t have a clue. The only firm conclusion I can draw is that this is not a set of terms drafted to soothe the worries of twitchy compliance teams at large businesses.

Data Protection

US big tech, somewhat notoriously, has an uneasy relationship with EU data protection law. Put the name Google, Meta, Amazon or Microsoft into GDPRHub’s advanced search and you’ll find decisions dealing with those parties in respect of pretty much every GDPR article that a private actor can be accused of violating. In the context of businesses, though, I’m going to confine myself to Article 28 (Processor) and Chapter V (Data Transfers).  

For the uninitiated, controllers determine the "purposes and means of processing", and processors just follow the controller's instructions. Big tech companies provide long and detailed data processing agreements in which they, amongst other things like security guarantees, vendor management and audit rights, promise to only process customer data in accordance with the documented instructions of their customer. 

DeepSeek’s approach to data protection is, again, confusing. The data protection section of the terms is brief but, for me at least, kind of fascinating:

5.3 We will collect and process the personal information you provide as the data subject in accordance with the "DeepSeek Privacy Policy". However, when your end users access downstream systems, applications, or functions that you've developed based on the open platform, the processing rules for their collected personal information are not covered by this privacy policy. As the controller of personal information processing activities in that scenario, you should disclose the relevant privacy policy to your end users.

First off: are they a processor or not? They say that the customer is the data controller. And the processing of personal data contained in content submitted through the API isn’t subject to their privacy policy. So… are they a processor? If they are, why wouldn’t they say so?

The question is all the more frustrating because of the semi-fluent GDPR-speak. That’s presumably partly explained by the Chinese data protection framework which shares a lot of its approach with the GDPR. The Personal Information Protection Law of the People’s Republic of China, I learnt while writing this, defines “personal information processor” (PIP) as something directly analogous to “controller” (they even talk about “purposes and method”). It also has an (undefined) concept of “entrusted party of personal information” which is required to process personal information “as agreed [with the PIP] and shall not process personal information beyond the agreed purpose and method of processing.” So - a processor. Finally, the PIP is also required to “agree with the entrusted party on the purpose, duration, and method of entrusted processing, type and protection measures of personal information as well as the rights and obligations of both parties, and supervise the personal information processing activities of the entrusted party.” And, as with the GDPR and Data processing agreements, this agreement needs to be formalised in a “contract”.

While the above explains DeepSeek’s apparent familiarity with EU data protection concepts, it raises at least as many questions as it answers. Most importantly: where’s the contract? DeepSeek's terms don't make reference to one, and there's nothing on their site. It certainly sounds like DeepSeek is intending to be a processor - it seems like the logical implication of saying that the developer is the controller and DeepSeek isn’t. But the means of doing that is nowhere to be found. 

Maybe the answer here is that DeepSeek considers itself a joint controller (another European data protection concept mirrored in the PIPL)? This would, presumably, mean that DeepSeek was training its models on the API data, especially if my suspicions about controlling language between the Consumer and API terms in Section 4 are accurate. However, instead of assisting the developers or organisations using the data, DeepSeek offloads all the disclosure, handling of rights requests, and presumably, liability, onto them.  Clearly this would be worse from both the compliance and confidentiality points of view. But, mostly, it’s just really unclear what’s actually going on.

Obviously I’m not qualified to make any assessment of how well this works under Chinese law. But from the EU/UK standpoint, it’s fair to say that DeepSeek’s terms are pretty unsatisfactory.

Data Transfers

If you've worked in European tech for any length of time, you will be aware that, periodically, everyone gets worried about sending data to America for reasons that have something to do with US law and possibly Edward Snowden? The details of this have an only-partially-deserved reputation for being incredibly boring, and so are largely ignored by non-data protection specialists. But the nub of the problem is pretty straightforward. EU law gives people rights over data. But if you put that data on servers in a non-EU country, then that country's law applies, and it might not grant you those rights. That's not necessarily a problem in itself, because maybe the owner of the server can just commit to abide by EU law. What it can't do, however, is prevent the government of that country from just requesting your data from whoever is running the server, and then doing something awful with it. This is mostly a problem for America, because that is where many of the servers are. But, in principle, it's a potential problem for any non-EEA country.[3]

It will not have escaped your notice that DeepSeek is a Chinese company. Its primary market is in China, its servers are in China, and it is subject to Chinese law. You may also have heard that, in terms of government uses of data, China is not exactly universally acclaimed for it's human rights-first approach. You can see where this is heading. In theory, the data transfers to China story still has a lot of road left to run. The Irish regulator has been investigating TikTok’s transfers for the past God-knows-how-long, with no sign of a decision yet. Max Schrems (the guy who keeps demolishing the US data transfer efforts) has now got in on the action, and those cases will take a while to make it through the courts. But the European Data Protection Board (who, for these purposes, you can think of as the people who produce authoritative guesses about what the law means in the absence of anything better) has already published its thoughts on the possibility of lawful transfers to China. Those are long, detailed, and speculative in places, but I think they can fairly be summarised as follows:

It’s therefore hard to blame DeepSeek for not playing a GDPR compliance game that they have no realistic chance of winning. 

So the potential scope of government access is a compliance worry for businesses because it's going to be hard to say you're GDPR compliant if you're YOLO'ing customer data to China. But more fundamentally, though, there's a general confidentiality concern. Stories about industrial espionage by China have been a constant of the international business press for more than a decade. This has been heating up as the US has taken a more confrontational stance with China on trade issues, meaning that it’s probably going to get a lot worse before it gets better. Much of the reporting emphasises sophisticated measures, the increasing use of human intelligence and a whole load of other stuff designed to give CISO’s sleepless nights, but sending data into the country so that the CCP can just request it rather obviates those concerns.

What’s a Business to Do?

All of the above may make it sound like I have some axe to grind against DeepSeek, as well as a touchingly naive faith in US LLM providers. I really don’t. DeepSeek has one advantage over all the other providers covered, and it seems to me pretty decisive. You don’t actually have to trust their service in order to use their models: both R1 and V3 are available on open source terms. The most obvious US comparators are Meta’s Llama models, which are less powerful, less open, and subject to more restrictive license terms. 

Running DeepSeek’s models on more trusted hardware can solve most of the problems outlined above. It avoids the ‘sending confidential or sensitive data to China’ problem,  and means organisations are no longer subject to DeepSeek’s terms. They can also stop worrying about some of the other security problems DeepSeek may face. If hosting the model yourself sounds like too heavy a lift and you’d prefer a nice API, an increasing number of vendors (including rightBrain) are now providing that service. 

For a businesses trying to mitigate risk, I think there are clear advantages to having alternatives to the big US providers, and particular advantages to those alternatives being open source. In a future post, I'll try to provide some detail around that.


Footnotes

[1] Obviously I am paraphrasing.

[2] Again, this varies by jurisdiction. Japan is fine.

[3] Data protection purists will notice several over-simplifications here. The EU maintains a list of countries whose law it considers acceptable. Again, Japan is fine. The US is included on a complicated conditional basis which keeps being challenged and re-instated with increasingly cumbersome names. Also, I am aware that the question of server location is less important than the possibility of logical access. I don't think any of this affects the larger point I am making, though.


Author's Note: This is a cross-posting of post originally published at Carey Lening's excellent Privacat Insights. That post emerged out of a collaboration between Pete and Carey in response to two initial posts about the DeepSeek consumer ChatBot. Those posts provide a lot of the background context to this one.

The version on this site has been heavily edited, and has lost of some of the the nerdier legal detail and many of the jokes. If you prefer your regulatory insight to take more time and fewer prisoners, you might prefer the original. Also, consider subscribing to Carey's work more generally, which includes some of the best stuff coming out about the intersection of LLMs and data protection law, and takes very few prisoners indeed.

It’s nearly two weeks since the last post about DeepSeek, and the mania for it only seems to be getting wilder. If you have any sort of role in vendor onboarding that means you’re probably hearing a lot of this sort of question:

The business really, really wants to use the new hotness that is DeepSeek. Can we?

Personally, I have a very definite answer to this.

You can self-host DeepSeek models or distillations of them. If you're comfortable with using large language models generally, the may well be a good idea. If self-hosting seems like a technically heavy lift, a number of providers (including Rightbrain!) make it available as part of their wider service.

Using DeepSeek's API or ChatBot is almost certainly not a good idea, at least if you're dealing with personal data, sensitive data, or any data over which you wish to maintain confidentiality.

If the last two paragraphs seem entirely obvious to you, you can probably stop reading this post. If you think that conclusion needs some justification, read on…

DeepSeek Is Not Your Standard Business LLM Provider

If you’re a compliance person being asked to approve use of DeepSeek, there’s a fair chance that the request is coming from the engineering department. If that’s the case, they’ll most likely be wanting to use DeepSeek’s API. 

In addition to its ChatBot, DeepSeek makes available an API so that developers can easily build its service into their products and tools. This is a pretty standard thing for the large LLM providers to do, and developers are used to building with those tools. APIs will be made available subject to terms of use from the provider. So how much difference is there between DeepSeek API and the other big providers”?

Actually, quite a lot.

No Business Offering

The fundamental difference between DeepSeek and other big providers is that, whilst DeepSeek have API terms, they don’t have business terms. The line between API and business offerings can often seem pretty indistinct. Indeed, the big LLM providers tend to elide the two: OpenAI and Anthropic combine their business and API terms, and production use of Google’s models tends to fall under its Google Cloud Platform terms. A casual reader might look at DeepSeek’s “Open Platform Terms of Service” (let’s call them “platform terms”) and conclude that they were dealing with a similar set-up. The fact that the terms contemplate “enterprise developer” customers does nothing to allay that impression.

But it’s the wrong impression. Really. DeepSeek’s API is better thought of as a more nerdy outgrowth of its consumer app. The Platform Terms themselves are presented as a supplement to DeepSeek's consumer Terms of Use (consumer terms). Both the platform terms and the consumer terms apply, with the platform terms taking priority. This is very different to the offerings of the big US providers set out above, who typically have a very clear division between their consumer and business terms. 

Why does that matter? Broadly, because businesses tend to be able to care more about certain things than consumers, and are in a slightly better position to get them. Businesses are, on average, more likely to read the terms (or pay someone to do it for them), to try and negotiate the terms, and to understand their legal and technical nuances. That results in some fairly predictable differences between consumer terms and business ones. 

Output Liability

The question of liability for outputs is a pretty important one. The whole point of an LLM is to be able to produce outputs, and many of the current policy debates in AI turn concern these. Are they accurate? Do they reflect systematic bias? Are they obscene? Could they be used as part of a nihilistic scheme to destroy the world?

In a business context you would usually (and, I think intuitively) expect the provider to take some responsibility for the outputs. If a business were to use DeepSeek's product to make fully automated hiring decisions (just a hypothetical - please do not actually do this), and those decisions turned out to be discriminatory, the affected candidate might sue the business. The business might rightly turn round to DeepSeek and say "This is a problem with your tool, what are you going to do about it?" And they would expect DeepSeek's terms to contain language to the effect of "we'll take some responsibility for the problem, so long as you yourself weren't doing something stupid."[1] The debate would then be over exactly when and to what extent DeepSeek was liable.

As Carey pointed out, DeepSeek's terms do the opposite, making the customer liable to DeepSeek for any harm resulting from the output. This seems… unfair? Weird, even? A business may have some limited control over what its users are putting into an LLM (more than the provider, anyway). If it knew what the output from the provider was going to be, though, it wouldn’t need an LLM in the first place. Asking the business to be liable for something it can’t control seems a like a deeply unreasonable request to make, and yet DeepSeek does.

This is a particularly acute problem with IP infringement. If an LLM is producing obscenity, or ducking questions about Tiananmen Square, or being blatantly discriminatory, it’s at least possible to test for that. But the question of whether an LLM’s output infringes any of the copyrights in its training data is a much harder one, both legally and practically. Depending on your jurisdiction, there's just less legal certainty in that area.

Take coding as a concrete example of this. Software coding is one thing that LLMs are widely agreed to be actually pretty useful for, and the idea that businesses might use LLMs to help write code seems intuitively plausible, and seems to have been borne out empirically. But that LLM was probably trained on masses of open source code. If a business uses an LLM to help write code, might it end up liable for infringement? Or, have its codebase subject to the terms of an open source license? It’s really very hard to know until we have more clarity - either from caselaw or legislation.[2]

In the face of that sort of uncertainty, many businesses might opt to just wait and see how the legal questions turn out. Clearly this is not what you'd like if you're trying to sell LLMs. Sure enough, in response to concerns like this, the big providers started making guarantees around IP in late 2023. Anthropic, Google and OpenAI now all provide their business customers with IP indemnities. Here’s an example from OpenAI’s business terms:

We agree to defend and indemnify you for any damages finally awarded by a court of competent jurisdiction and any settlement amounts payable to a third party arising out of a third party claim alleging that the Services (including training data we use to train a model that powers the Services) infringe any third party intellectual property right. 

So, if you’re a business getting sued because someone says your LLM-generated content is a copy of their work, you can expect the the big providers to take ownership of that problem for you. If you are a consumer, however, you’re on your own. Consumer terms tend not to contain IP indemnities and, at the risk of getting repetitive, DeepSeek’s terms are consumer terms. 

Model Training

It’s not just LLM output you need to worry about, though. It’s standard practice for the big US LLM providers to train their models on the content (inputs and outputs) processed by the consumer app. As with IP, confidentiality tends to be a pretty big concern of most businesses - the sort of thing that might keep them from using LLMs at all if there was too much uncertainty. Google’s language around the use of customer data submitted through Vertex AI is thus pretty typical of the sorts of reassurances US providers offer: “Google will not use Customer Data to train or fine-tune any AI/ML models without Customer's prior permission or instruction.”

DeepSeek’s terms certainly don’t contain anything that categorical. What uses of customer data can DeepSeek make? I have been scratching my head about this for some time and, honestly, I still have no idea. 

This is the section of the API terms dealing with content usage in its entirety:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

The relevant section of the consumer terms reads as follows:

4.1 You are responsible for all Inputs you submit to our Services and corresponding Outputs. By submitting Inputs to our Services, you represent and warrant that you have all rights, licenses, and permissions that are necessary for us to process the Inputs under our Terms. You also represent and warrant that your submitting Inputs to us and corresponding Outputs will not violate our Terms, or any laws or regulations applicable to those Inputs and Outputs.

4.2 Subject to applicable law and our Terms, you have the following rights regarding the Inputs and Outputs of the Services: (1) You retain any rights, title, and interests—if any—in the Inputs you submit; (2) We assign any rights, title, and interests—if any—in the Outputs of the Services to you. (3) You may apply the Inputs and Outputs of the Services to a wide range of use cases, including personal use, academic research, derivative product development, training other models (such as model distillation), etc.

4.3 In order to fulfill the requirements stipulated by laws and regulations or provide the Services specified in these Terms, and under the premise of secure encryption technology processing, strict de-identification rendering, and irreversibility to identify specific individuals, we may, to a minimal extent, use Inputs and Outputs to provide, maintain, operate, develop or improve the Services or the underlying technologies supporting the Services. If you refuse to allow us to process the data in the manner described above, you may provide feedback to us through the methods outlined in Section 10.

As you’ll have noticed, the terms are pretty much identical, except that the Customer terms have an additional paragraph allowing for DeepSeek to use customer data. But, remember, it’s all part of one big agreement. Those consumer terms also apply to API usage, unless the API terms contradict them. Does the fact that the API terms have reproduced the same language but omit the usage part mean that the API terms are contradicting the consumer terms? Does the fact these terms are governed by Chinese law affect the answer? My guess would be that the right to use data still stands but, frankly, I don’t have a clue. The only firm conclusion I can draw is that this is not a set of terms drafted to soothe the worries of twitchy compliance teams at large businesses.

Data Protection

US big tech, somewhat notoriously, has an uneasy relationship with EU data protection law. Put the name Google, Meta, Amazon or Microsoft into GDPRHub’s advanced search and you’ll find decisions dealing with those parties in respect of pretty much every GDPR article that a private actor can be accused of violating. In the context of businesses, though, I’m going to confine myself to Article 28 (Processor) and Chapter V (Data Transfers).  

For the uninitiated, controllers determine the "purposes and means of processing", and processors just follow the controller's instructions. Big tech companies provide long and detailed data processing agreements in which they, amongst other things like security guarantees, vendor management and audit rights, promise to only process customer data in accordance with the documented instructions of their customer. 

DeepSeek’s approach to data protection is, again, confusing. The data protection section of the terms is brief but, for me at least, kind of fascinating:

5.3 We will collect and process the personal information you provide as the data subject in accordance with the "DeepSeek Privacy Policy". However, when your end users access downstream systems, applications, or functions that you've developed based on the open platform, the processing rules for their collected personal information are not covered by this privacy policy. As the controller of personal information processing activities in that scenario, you should disclose the relevant privacy policy to your end users.

First off: are they a processor or not? They say that the customer is the data controller. And the processing of personal data contained in content submitted through the API isn’t subject to their privacy policy. So… are they a processor? If they are, why wouldn’t they say so?

The question is all the more frustrating because of the semi-fluent GDPR-speak. That’s presumably partly explained by the Chinese data protection framework which shares a lot of its approach with the GDPR. The Personal Information Protection Law of the People’s Republic of China, I learnt while writing this, defines “personal information processor” (PIP) as something directly analogous to “controller” (they even talk about “purposes and method”). It also has an (undefined) concept of “entrusted party of personal information” which is required to process personal information “as agreed [with the PIP] and shall not process personal information beyond the agreed purpose and method of processing.” So - a processor. Finally, the PIP is also required to “agree with the entrusted party on the purpose, duration, and method of entrusted processing, type and protection measures of personal information as well as the rights and obligations of both parties, and supervise the personal information processing activities of the entrusted party.” And, as with the GDPR and Data processing agreements, this agreement needs to be formalised in a “contract”.

While the above explains DeepSeek’s apparent familiarity with EU data protection concepts, it raises at least as many questions as it answers. Most importantly: where’s the contract? DeepSeek's terms don't make reference to one, and there's nothing on their site. It certainly sounds like DeepSeek is intending to be a processor - it seems like the logical implication of saying that the developer is the controller and DeepSeek isn’t. But the means of doing that is nowhere to be found. 

Maybe the answer here is that DeepSeek considers itself a joint controller (another European data protection concept mirrored in the PIPL)? This would, presumably, mean that DeepSeek was training its models on the API data, especially if my suspicions about controlling language between the Consumer and API terms in Section 4 are accurate. However, instead of assisting the developers or organisations using the data, DeepSeek offloads all the disclosure, handling of rights requests, and presumably, liability, onto them.  Clearly this would be worse from both the compliance and confidentiality points of view. But, mostly, it’s just really unclear what’s actually going on.

Obviously I’m not qualified to make any assessment of how well this works under Chinese law. But from the EU/UK standpoint, it’s fair to say that DeepSeek’s terms are pretty unsatisfactory.

Data Transfers

If you've worked in European tech for any length of time, you will be aware that, periodically, everyone gets worried about sending data to America for reasons that have something to do with US law and possibly Edward Snowden? The details of this have an only-partially-deserved reputation for being incredibly boring, and so are largely ignored by non-data protection specialists. But the nub of the problem is pretty straightforward. EU law gives people rights over data. But if you put that data on servers in a non-EU country, then that country's law applies, and it might not grant you those rights. That's not necessarily a problem in itself, because maybe the owner of the server can just commit to abide by EU law. What it can't do, however, is prevent the government of that country from just requesting your data from whoever is running the server, and then doing something awful with it. This is mostly a problem for America, because that is where many of the servers are. But, in principle, it's a potential problem for any non-EEA country.[3]

It will not have escaped your notice that DeepSeek is a Chinese company. Its primary market is in China, its servers are in China, and it is subject to Chinese law. You may also have heard that, in terms of government uses of data, China is not exactly universally acclaimed for it's human rights-first approach. You can see where this is heading. In theory, the data transfers to China story still has a lot of road left to run. The Irish regulator has been investigating TikTok’s transfers for the past God-knows-how-long, with no sign of a decision yet. Max Schrems (the guy who keeps demolishing the US data transfer efforts) has now got in on the action, and those cases will take a while to make it through the courts. But the European Data Protection Board (who, for these purposes, you can think of as the people who produce authoritative guesses about what the law means in the absence of anything better) has already published its thoughts on the possibility of lawful transfers to China. Those are long, detailed, and speculative in places, but I think they can fairly be summarised as follows:

It’s therefore hard to blame DeepSeek for not playing a GDPR compliance game that they have no realistic chance of winning. 

So the potential scope of government access is a compliance worry for businesses because it's going to be hard to say you're GDPR compliant if you're YOLO'ing customer data to China. But more fundamentally, though, there's a general confidentiality concern. Stories about industrial espionage by China have been a constant of the international business press for more than a decade. This has been heating up as the US has taken a more confrontational stance with China on trade issues, meaning that it’s probably going to get a lot worse before it gets better. Much of the reporting emphasises sophisticated measures, the increasing use of human intelligence and a whole load of other stuff designed to give CISO’s sleepless nights, but sending data into the country so that the CCP can just request it rather obviates those concerns.

What’s a Business to Do?

All of the above may make it sound like I have some axe to grind against DeepSeek, as well as a touchingly naive faith in US LLM providers. I really don’t. DeepSeek has one advantage over all the other providers covered, and it seems to me pretty decisive. You don’t actually have to trust their service in order to use their models: both R1 and V3 are available on open source terms. The most obvious US comparators are Meta’s Llama models, which are less powerful, less open, and subject to more restrictive license terms. 

Running DeepSeek’s models on more trusted hardware can solve most of the problems outlined above. It avoids the ‘sending confidential or sensitive data to China’ problem,  and means organisations are no longer subject to DeepSeek’s terms. They can also stop worrying about some of the other security problems DeepSeek may face. If hosting the model yourself sounds like too heavy a lift and you’d prefer a nice API, an increasing number of vendors (including rightBrain) are now providing that service. 

For a businesses trying to mitigate risk, I think there are clear advantages to having alternatives to the big US providers, and particular advantages to those alternatives being open source. In a future post, I'll try to provide some detail around that.


Footnotes

[1] Obviously I am paraphrasing.

[2] Again, this varies by jurisdiction. Japan is fine.

[3] Data protection purists will notice several over-simplifications here. The EU maintains a list of countries whose law it considers acceptable. Again, Japan is fine. The US is included on a complicated conditional basis which keeps being challenged and re-instated with increasingly cumbersome names. Also, I am aware that the question of server location is less important than the possibility of logical access. I don't think any of this affects the larger point I am making, though.


Rightbrain

Rightbrain

Rightbrain

Rightbrain

Rightbrain

RELATED CONTENT

Our latest blogs and articles

Join our developer slack

Request to join our developer slack channel

Join us on

Join our developer slack

Request to join our developer slack channel

Join us on

Join our developer slack

Request to join our developer slack channel

Join us on

Join our developer slack

Request to join our developer slack channel

Join us on

Join our developer slack

Request to join our developer slack channel

Join us on

Join our developer slack

Request to join our developer slack channel

Join us on