# Quantum Autoencoder

 Copyright (c) 2021 Institute for Quantum Computing, Baidu Inc. All Rights Reserved. 

## Overview

This tutorial will show how to train a quantum autoencoder to compress and reconstruct a given quantum state (mixed state) [1].

### Theory

The form of the quantum autoencoder is very similar to the classical autoencoder, which is composed of an encoder $E$ and a decoder $D$. For the input quantum state $\rho_{in}$ of the $N$ qubit system (here we use the density operator representation of quantum mechanics to describe the mixed state), first use the encoder $E = U(\theta)$ to encode information into some of the qubits in the system. This part of qubits is denoted by **system $A$**. After measuring and discarding the remaining qubits (this part is denoted by **system $B$**), we get the compressed quantum state $\rho_{encode}$! The dimension of the compressed quantum state is the same as the dimension of the quantum system $A$. Suppose we need $N_A$ qubits to describe the system $A$, then the dimension of the encoded quantum state $\rho_{encode}$ is $2^{N_A}\times 2^{N_A}$. Note that the mathematical operation corresponding to the measure-and-discard operation in this step is partial trace. The reader can intuitively treat it as the inverse operation of the tensor product $\otimes$.

Let us look at a specific example. Given a quantum state $\rho_A$ of $N_A$ qubits and another quantum state $\rho_B$ of $N_B$ qubits, the quantum state of the entire quantum system composed of subsystems $A$ and $B$ is $\rho_{AB} = \rho_A \otimes \rho_B$, which is a state of $N = N_A + N_B$ qubits. Now we let the entire quantum system evolve under the action of the unitary matrix $U$ for some time to get a new quantum state $\tilde{\rho_{AB}} = U\rho_{AB}U^\dagger$. So if we only want to get the new quantum state $\tilde{\rho_A}$ of quantum subsystem A at this time, what should we do? We simply measure the quantum subsystem $B$ and then discard it. This step of the operation is completed by partial trace $\tilde{\rho_A} = \text{Tr}_B (\tilde{\rho_{AB}})$. With Paddle Quantum, we can call the built-in function `partial_trace(rho_AB, 2**N_A, 2**N_B, 2)` to complete this operation. **Note:** The last parameter is 2, which means that we want to discard quantum system $B$.

![QA-fig-encoder_pipeline](./figures/QA-fig-encoder_pipeline.png)

After discussing the encoding process, let us take a look at how decoding is done. To decode the quantum state $\rho_{encode}$, we need to introduce an ancillary system $C$ with the same dimension as the system $B$ and take its initial state as the $|0\dots0\rangle$ state. Then use the decoder $D = U^\dagger(\theta)$ to act on the entire quantum system $A+C$ to decode the compressed information in system A. We hope that the final quantum state $\rho_{out}$ and $\rho_{in}$ are as similar as possible and use Uhlmann-Josza fidelity $F$ to measure the similarity between them.

$$
F(\rho_{in}, \rho_{out}) = \left(\operatorname{tr} \sqrt{\sqrt{\rho_{in}} \rho_{out} \sqrt{\rho_{in}}} \right)^{2}.
\tag{1}
$$

Finally, by optimizing the encoder's parameters, we can improve the fidelity of $\rho_{in}$ and $\rho_{out}$ as much as possible.

## Paddle Quantum Implementation

Next, we will use a simple example to show the workflow of the quantum autoencoder. Here we first import the necessary packages.

In [1]:
from IPython.core.display import HTML
display(HTML(""))

In [2]:
import numpy as np
from numpy import diag
import scipy
import scipy.stats
import paddle
from paddle import matmul, trace, kron, real
from paddle_quantum.circuit import UAnsatz
from paddle_quantum.utils import dagger, state_fidelity, partial_trace

### Generating the initial state

Let us consider the quantum state $\rho_{in}$ of $N = 3$ qubits. We first encode the information into the two qubits below (system $A$) through the encoder then measure and discard the first qubit (system $B$). Secondly, we introduce another qubit (the new reference system $C$) in state $|0\rangle$ to replace the discarded qubit $B$. Finally, through the decoder, the compressed information in A is restored to $\rho_{out}$. Here, we assume that the initial state is a mixed state and the spectrum of $\rho_{in}$ is $\lambda_i \in \{0.4, 0.2, 0.2, 0.1, 0.1, 0, 0, 0\}$, and then generate the initial state $\rho_{in}$ by applying a random unitary transformation.



In [3]:
N_A = 2 # Number of qubits in system A
N_B = 1 # Number of qubits in system B
N = N_A + N_B # Total number of qubits

scipy.random.seed(1) # Fixed random seed
V = scipy.stats.unitary_group.rvs(2**N) # Generate a random unitary matrix
D = diag([0.4, 0.2, 0.2, 0.1, 0.1, 0, 0, 0]) # Enter the spectrum of the target state rho
V_H = V.conj().T # Apply Hermitian transpose
rho_in = (V @ D @ V_H).astype('complex128') # Generate rho_in

# Initialize the quantum system C
rho_C = np.diag([1,0]).astype('complex128')

### Building a quantum neural network

Here, we use quantum neural networks (QNN) as encoders and decoders. Suppose system A has $N_A$ qubits, both system $B$ and $C$ have $N_B$ qubits, and the depth of the QNN is $D$. Encoder $E$ acts on the total system composed of systems A and B, and decoder $D$ acts on the total system composed of $A$ and $C$. In this example, $N_{A} = 2$ and $N_{B} = 1$.

In [4]:
# Set circuit parameters
cir_depth = 6 # Circuit depth
block_len = 2 # The length of each block
theta_size = N*block_len*cir_depth # The size of the circuit parameter theta


# Build the encoder E
def Encoder(theta):

 # Initialize the network with UAnsatz
 cir = UAnsatz(N)
 
 # Build the network by layers
 for layer_num in range(cir_depth):
 
 for which_qubit in range(N):
 cir.ry(theta[block_len*layer_num*N + which_qubit], which_qubit)
 cir.rz(theta[(block_len*layer_num + 1)*N+ which_qubit], which_qubit)

 for which_qubit in range(N-1):
 cir.cnot([which_qubit, which_qubit + 1])
 cir.cnot([N-1, 0])

 return cir

### Configuring the training model: loss function

Here, we define the loss function to be

$$
Loss = 1-\langle 0...0|\rho_{trash}|0...0\rangle,
\tag{2}
$$

where $\rho_{trash}$ is the quantum state of the system $B$ discarded after encoding. Then we train the QNN through PaddlePaddle to minimize the loss function. If the loss function reaches 0, the input state and output state will be exactly the same state. This means that we have achieved compression and decompression perfectly, in which case the fidelity of the initial and final states is $F(\rho_{in}, \rho_{out}) = 1$.

In [5]:
# Set hyper-parameters
N_A = 2 # Number of qubits in system A
N_B = 1 # Number of qubits in system B
N = N_A + N_B # Total number of qubits
LR = 0.2 # Set the learning rate
ITR = 100 # Set the number of iterations
SEED = 15 # Fixed random number seed for initializing parameters

class NET(paddle.nn.Layer):
 def __init__(self, shape, dtype='float64'):
 super(NET, self).__init__()
 
 # Convert Numpy array to Tensor supported in PaddlePaddle
 self.rho_in = paddle.to_tensor(rho_in)
 self.rho_C = paddle.to_tensor(rho_C)
 self.theta = self.create_parameter(shape=shape,
 default_initializer=paddle.nn.initializer.Uniform(low=0.0, high=2 * np.pi),
 dtype=dtype, is_bias=False)
 
 # Define loss function and forward propagation mechanism
 def forward(self):
 
 # Generate initial encoder E and decoder D
 cir = Encoder(self.theta)
 E = cir.U
 E_dagger = dagger(E)
 D = E_dagger
 D_dagger = E

 # Encode the quantum state rho_in
 rho_BA = matmul(matmul(E, self.rho_in), E_dagger)
 
 # Take partial_trace() to get rho_encode and rho_trash
 rho_encode = partial_trace(rho_BA, 2 ** N_B, 2 ** N_A, 1)
 rho_trash = partial_trace(rho_BA, 2 ** N_B, 2 ** N_A, 2)

 # Decode the quantum state rho_out
 rho_CA = kron(self.rho_C, rho_encode)
 rho_out = matmul(matmul(D, rho_CA), D_dagger)
 
 # Calculate the loss function with rho_trash
 zero_Hamiltonian = paddle.to_tensor(np.diag([1,0]).astype('complex128'))
 loss = 1 - real(trace(matmul(zero_Hamiltonian, rho_trash)))

 return loss, self.rho_in, rho_out, cir


paddle.seed(SEED)
# Generate network
net = NET([theta_size])
# Generally speaking, we use Adam optimizer to get relatively good convergence
# Of course, it can be changed to SGD or RMS prop.
opt = paddle.optimizer.Adam(learning_rate=LR, parameters=net.parameters())

# Optimization loops
for itr in range(1, ITR + 1):
 # Forward propagation for calculating loss function
 loss, rho_in, rho_out, cir = net()
 # Use back propagation to minimize the loss function
 loss.backward()
 opt.minimize(loss)
 opt.clear_grad()
 # Calculate and print fidelity
 fid = state_fidelity(rho_in.numpy(), rho_out.numpy())
 if itr% 10 == 0:
 print('iter:', itr,'loss:','%.4f'% loss,'fid:','%.4f'% np.square(fid))
 if itr == ITR:
 print("\nThe trained circuit:")
 print(cir)

iter: 10 loss: 0.1683 fid: 0.8211
iter: 20 loss: 0.1231 fid: 0.8720
iter: 30 loss: 0.1122 fid: 0.8810
iter: 40 loss: 0.1058 fid: 0.8864
iter: 50 loss: 0.1025 fid: 0.8901
iter: 60 loss: 0.1019 fid: 0.8907
iter: 70 loss: 0.1013 fid: 0.8914
iter: 80 loss: 0.1012 fid: 0.8917
iter: 90 loss: 0.1010 fid: 0.8921
iter: 100 loss: 0.1008 fid: 0.8924

The trained circuit:
--Ry(3.935)----Rz(2.876)----*---------X----Ry(2.678)----Rz(6.372)----*---------X----Ry(5.516)----Rz(4.082)----*---------X----Ry(1.199)----Rz(1.584)----*---------X----Ry(4.512)----Rz(0.847)----*---------X----Ry(5.038)----Rz(0.564)----*---------X--
 | | | | | | | | | | | | 
--Ry(2.045)----Rz(4.282)----X----*----|----Ry(6.116)----Rz(6.203)----X----*----|----Ry(5.135)----Rz(4.828)----X----*----|----Ry(3.532)----Rz(3.827)----X----*----|----Ry(0.497)----Rz(1.693)----X----*----|----Ry(5.243)----Rz(5.329)----X----*----|--
 | | | | | | | | | | | | 
--Ry(2.706)----Rz(4.168)---------X----*----Ry(2.141)----Rz(2.014)---------X----*----Ry(5.36

If the dimension of system A is denoted by $d_A$, it is easy to prove that the maximum fidelity can be achieved by quantum autoencoder is the sum of $d_A$ largest eigenvalues ​​of $\rho_{in}$. In our case $d_A = 4$ and the maximum fidelity is

$$
F_{\text{max}}(\rho_{in}, \rho_{out}) = \sum_{j=1}^{d_A} \lambda_j(\rho_{in})= 0.4 + 0.2 + 0.2 + 0.1 = 0.9.
\tag{3}
$$

After 100 iterations, the fidelity achieved by the quantum autoencoder we trained reaches above 0.89, which is very close to the optimal value.

_______

## References

[1] Romero, J., Olson, J. P. & Aspuru-Guzik, A. Quantum autoencoders for efficient compression of quantum data. [Quantum Sci. Technol. 2, 045001 (2017).](https://iopscience.iop.org/article/10.1088/2058-9565/aa8072)

