Where you decide to shard for scalability impacts the complexity of the entire architecture.
Sharding has become a popular means of achieving scalability in application architectures in which read/write data separation is not only possible, but desirable to achieve new heights of concurrency. The premise is that by splitting up read and write duties, it is possible to get better overall performance at the cost of a slight delay in consistency. That is, it takes a bit of time to replicate changes initiated by a "write" to the read-only master database. It's eventually consistent, and it's generally considered an acceptable trade off when searching for higher and higher scalability.
While the most well-known cases of read/write separation and sharding are based on geography - east coast versus west coast, for example - there are other cases where localized sharding has also been put into play with great success. Generally these types of architectures base their sharding decisions on user names, splitting them up between databases based on statistical analysis of occurrences. This achieves greater scalability at the data layer by better distributing the rate at which writes (which are generally speaking a blocking action) occur, and thus achieving greater scale and concurrency for only a slight period of inconsistency.
The mechanism is, in theory, quite simple and is loosely based on an algorithmic principle taught in most computer science algorithm classes: hashing. Basically when a request comes in, the application looks at some piece of data - like the user name - and based on that data decides which of X databases to send it to. How that division is determined is not as relevant (to this discussion, anyway) as the action itself. It's like registration at an event or in college where you're split up based on the first letter of your last name. You remember, every one whose name starts with A-G go in this line, you others go over there, in those lines.
That's sharding. And it's most commonly implemented in the application, where the connection to a database is created and used to manage the data that is the lifeblood of every application and business today.
Now, I told you all that to share another approach to sharding; one that takes advantage of programmability in the network (data path programmability, to be precise).
In the first scenario, in which sharding occurs in the application, there's almost certainly (I'd be willing to bet real live money) a load balancing service in front. It's distributing requests to a pool (cluster) of application instances, each of which individually decides which database to talk to given the data available. If we insert some intelligence - some programmability - into that load balancing service, we move the sharding decision in front of the application.
Now when a request comes in, the load balancing service examines the data available and decides to which application instance the request should be sent. The data is likely the same - a user identity - but may be something more applicable to the service, say a product name or number extricated from the URL of a RESTful API fronting a microservice.
Basically what you're doing is taking the block of code responsible for sharding from inside the application and moving it into the network.
In the diagram to your left (or at least it was on the left when I wrote this) illustrates. The reason the "apps" in the "network" illustration are colored is to highlight that each of them is dedicated to a specific database. The code - the app itself - is all the same. There's no difference except for the configuration that tells it "you are dedicated to database A-G".
This is starkly different from the "in the application" sharding example in which all instances are exactly the same, including the configuration, as each one may be at any time talking to any one of the databases, depending on the data received.
Now, I believe it's obvious (because I colored all the database connections) that when you shard in the application, the complexity of the network is pretty high, as well as the load on each database as it has to maintain connections with each and every application instance. Operational Axiom #2 tells us "as load increases, performance decreases" so it's likely we're seeing an impact on performance (in a negative way) from a sharding in the application architectural approach.
Conversely, the network complexity in the sharding in the network approach is fairly low (and straight forward) and actually simplifies the entire architecture. The load on the databases themselves remains lower because there is only one instance (or pool of instances) with which it needs to manage connections.
The negative of the "in the network" approach is that you have another component (service) that must be managed - that means application lifecycle management applies - and there are likely separate configurations necessary for each of the pools (clusters) responsible for scaling out each application instance (because each pool only talks to one of the databases). But this negative also means that the code responsible for sharding is localized, it is itself a "network microservice" that is small and isolated, meaning it can be tweaked independently of the application code. That's a positive, especially if there might be a need to increase the level of sharding or change its core mechanism required to scale the application. That's one of the reasons microservices are growing more popular; the localization and isolation ensure the ability to change without disruptive impact on other services or applications.
Taking advantage of programmability in the network to achieve new levels of scalability while simplifying your architecture is another reason programmability in the network is an invaluable tool in your architectural toolbox.