fix bugs and refine reader name

fa5b0a9c · xixiaoyao · e180d15b · fa5b0a9c · fa5b0a9c · fa5b0a9c
29 changed file
--- a/.gitignore
+++ b/.gitignore
@@ -2,5 +2,8 @@
 __pycache__
 pretrain_model
 output_model
+build
+dist
+paddle_palm.egg-info
 mrqa_output
 *.log
--- a/config_demo1.yaml
+++ b/config_demo1.yaml
@@ -5,10 +5,6 @@ save_path: "output_model/firstrun"
 backbone: "bert"
 backbone_config_path: "pretrain_model/bert/bert_config.json"
-vocab_path: "pretrain_model/bert/vocab.txt"
-do_lower_case: True
-max_seq_len: 512
 batch_size: 5
 num_epochs: 3
 optimizer: "adam"

--- a/config_demo3.py
+++ b/config_demo3.py
+task_instance: "mrqa"
+save_path: "output_model/firstrun"
+backbone: "bert"
+backbone_config_path: "pretrain_model/bert/bert_config.json"
+vocab_path: "pretrain_model/bert/vocab.txt"
+do_lower_case: True
+max_seq_len: 512
+batch_size: 5
+num_epochs: 3
+optimizer: "adam"
+learning_rate: 3e-5
+warmup_proportion: 0.1
+weight_decay: 0.1
+print_every_n_steps: 1
--- a/data/cls4mrqa/train.tsv
+++ b/data/cls4mrqa/train.tsv
+label	text_a
+1	when was the last time the san antonio spurs missed the playoffshave only missed the playoffs four times since entering the NBA ; they have not missed the playoffs in the 20 seasons since Tim Duncan was drafted by the Spurs in 1997 . With their 50th win in the 2016 -- 17 season , the Spurs extended their record for most consecutive 50 - win seasons to 18 ( the Spurs did not
+0	the creation of the federal reserve system was an attempt toReserve System ( also known as the Federal Reserve or simply the Fed ) is the central banking system of the United States of America . Over the years , events such as the Great Depression in the 1930s and the Great Recession during the 2000s have led to the expansion of the
+2	group f / 64 was a major backlash against the earlier photographic movement off / 64 was formed , Edward Weston went to a meeting of the John Reed Club , which was founded to support Marxist artists and writers . These circumstances not only helped set up the situation in which a group
+0	Bessarabia eventually became under the control of which country?city of Vilnius – its historical capital, which was under Polish control during the inter-war
+0	Iran's inflation led to what in 1975-1976?the economy of Iran was flooded with foreign currency, which caused inflation. By 1974, the economy of Iran was experiencing double digit inflation, and despite many large projects to modernize the country, corruption was rampant and caused large
+1	How many steam warships did Japan have in 1867?Yokosuka and Nagasaki. By the end of the Tokugawa shogunate in 1867, the Japanese navy of the shogun already possessed eight western-style steam warships around the flagship Kaiyō Maru, which were used against pro-imperial forces during the Boshin war, under the command
+0	How many people were inside?f former NFL head coach Dan Reeves, suffered a broken back. DeCamillis was seen on a stretcher wearing a neck brace. A line of heavy thunderstorms was moving through the Dallas area at the time, he said, but no other damage to buildings was reported, said Mike Adams, a dispatcher for the Irving, Texas, fire department. Watch the roof collapse on players, coaches » Arnold Payne, a photographer for WFAA, was shooting the Cowboys' practice session when rain began falling "tremendously hard." "I noticed the walls started to waver ... and then I noticed that the lights that were hanging from the ceiling started to sway, and it wouldn't stop," Payne told CNN. Shortly after that, he said, "It was as if someone took a stick pin and hit a balloon." Watch Payne describe being inside when structure collpased » Payne said
+0	Ishita Dutta is the sister of an actress who is typically cast in what genre of movies?he suspense thriller film "Drishyam" (2015) and the Hindi soap opera "Ek Ghar Banaunga", that aired on Star Plus. She is the younger sister of actress Tanushree Dutta. Dutta is the recipient of Femina Miss India Universe title in 2004. During the same year
+3	when did the the civil war start and end/Th> </Tr> <Tr> <Td> <P> 110,000 + killed in action / died of wounds 230,000 + accident / disease deaths 25,000 -- 30,000 died in Confederate prisons </P> <P> 365,000 + total dead
+1	What has Pakistan told phone companies?Islamabad, Pakistan (CNN) -- Under heavy criticism for a telling cell phone carriers to ban certain words in text messages, the Pakistan Telecommunication Authority went into damage control mode Wednesday. PTA spokesman Mohammed Younis Wednesday denied the existence of the plan, which has met with derision from mobile phone users in the country. "If at all we finally decide to
+0	What did Bush say the proposal was to a proposal he vetoed before?(CNN) -- President Bush vetoed an expansion of the federally funded, state-run health insurance program for poor children for a second time Wednesday, telling Congress the bill "moves our country's health care system in the wrong direction." In his veto message, President Bush calls on Congress to extend funding for the current program. "Because the Congress has chosen to send me an essentially identical bill that has the same problems as the flawed bill I previously vetoed, I must veto this legislation, too," he said in a statement released by the White House. The bill would
+0	Where did the football team that Bob Simmons coached from 1995 to 2000 play their home games?Cowboys football team [SEP] The 1998 Oklahoma State Cowboys football team represented the Oklahoma State University during the 1998 NCAA Division I-A football season. They participated as members of the Big 12 Conference in the South Division. They were coached by head coach Bob Simmons. [PAR] [TLE] Bob Simmons (American football coach) [SEP] Bob
+2	What anniversary was recently celebrated in Iran?us to move our policy in a new direction," Obama said. "So there are going to be a set of objectives that we have in these conversations, but I think that there's the possibility at least of a relationship of mutual respect and progress." The United States and Iran have not had diplomatic relations since 1979. During that year, the Shah of Iran was forced to flee the country and the Ayatollah Khomeini took power. Later that year, Iranian students took over and seized hostages at the U.S. Embassy. Relations have been cut since then. U.S. President George W. Bush labeled Iran as a member of the "axis of evil" after the Sept. 11, 2001 attacks. Iran celebrated the 30th anniversary of the revolution Tuesday with crowds chanting "Death to America." Watch the parade in Iran Â» Tensions have rippled over issues such as Iran's nuclear program, Israel, and Iraq, and have been aggravated since the outspoken Ahmadinejad came to power in 2005. Western
+1	Which Italian composer did George Balanchine add in 1976?[PAR] [TLE] Arcangelo Corelli [SEP] Arcangelo Corelli ( ; 17 February 1653 – 8 January 1713) was an Italian violinist and composer of the Baroque era. His music
+0	Will the playstation 4 be announced?a new system sometime in the next five years, of course. Sony continued to sell the PlayStation 2 system and games years after the PlayStation 3 debuted in stores. For Sony's next console, the company will not deploy a streaming delivery system like OnLive, or fully cut out disc retailers like Best Buy and GameStop, Hirai said. While Sony has increased the number of games and other media available for download or streaming through its networks, most people cannot be expected to frequently download several gigabytes worth of data, which can be a time-consuming process, he said. Sony Computer Entertainment president Andrew House said earlier that Sony is not planning to discuss a new console, the website ComputerAndVideogames.com reported on Monday.
+1	How many children were the Americans trying to kidnap out of Haiti?Port-au-Prince, Haiti (CNN) -- A Haitian attorney representing 10 Americans charged with kidnapping for trying to take 33 children out of Haiti told CNN Sunday he has resigned. Edwin Coq said he had quit as a lawyer for the Americans. It wasn't immediately clear who would replace him. "I know that they have been looking at other lawyers," said Phyllis Allison, mother of one of those detained, Jim Allen. "They don't know what to do." The 10 missionaries, including group leader Laura Silsby, were charged Thursday with kidnapping children and criminal association. Coq had said that court hearings would be held Monday
+0	who kills tree gelbman in happy death dayTree convinces Carter of her predicament by showing that she holds foreknowledge of the day 's events . Tree admits to Carter she does n't like who
+0	What will no person be denied the enjoyment of in Georgia based on their religious principles?amended as follows: "Article IV. Section 10. No person within this state shall, upon any pretense, be deprived of the inestimable privilege of worshipping God in any
+0	who came up with the idea of footballpass . The popularity of college football grew as it became the dominant version of the sport in the United States for the first half of the 20th century . Bowl games , a college football tradition , attracted a national audience for college
+0	what is the name of the female smurfbefore the smurflings created Sassette , Smurfette was the only female smurf in the Smurf Village .
+3	Who contributed to the American studies programs at Yale and University of Wyoming?struggle. Norman Holmes Pearson, who worked for the Office of Strategic Studies in London during World War II, returned to Yale and headed the new American studies program, in which scholarship quickly became an instrument of promoting
+0	What is the group's former name that now has an office with the Chief Actuary besides the Social Security Administration?Office of the Chief Actuary [SEP] The Office of the Chief Actuary is a government agency that has responsibility for actuarial estimates regarding social welfare programs. In Canada, the Office of the Chief Actuary works with the Canada Pension Plan and the Old Age Security Program. In the United States, both the Social Security Administration and the Centers for Medicare and Medicaid Services have an Office of the Chief Actuary that deals with Social Security and Medicare, respectively. A similar agency in the United Kingdom is called the Government Actuary's Department
+0	The actor that playes Han Solo in the "Star Wars" film series stars with Blake Lively and Michiel Huisman in a film directed by who?about a woman who stops aging after an accident at the age of 29. Mills Goodloe and Salvador Paskowitz. The film stars Blake Lively, Michiel Huisman, Kathy Baker, Amanda Crew, Harrison Ford, and Ellen Burstyn. The film was theatrically released on April 24, 2015 by Lionsgate. [PAR] [TLE] Harrison Ford [SEP] Harrison Ford
+0	What historically black university's men's basketball coach was formerly head coach at Virginia Tech?well as an 1890 Historically Black Land-Grant University. The University is a member-school of the Thurgood Marshall College Fund. He was also the head coach at Virginia Tech, Tennessee
+0	what year did syracuse win the ncaa tournament. Their combined record is 67 -- 39 .
+1	where do i get chips at a casino<P> Money is exchanged for tokens in a casino at the casino cage , at the gaming tables , or at a cashier station . The tokens are
+0	when was the winter fuel payment first introducedheating over the winter months .
+0	Trophy hunting can include areas which would likely be unsuitable for what other types of ecotourism?study states that less than 3% of a trophy hunters' expenditures reach the local level, meaning that the economic incentive and benefit is "minimal, particularly when we consider the vast areas of
+1	In simple language, what are the interconnections in an embedding matrix?Since it was quite easy to stack interconnections (wires) inside the embedding matrix, the approach allowed designers to forget completely about the routing of wires (usually a time-consuming operation of PCB design): Anywhere the designer needs a connection, the machine will draw a wire in straight line from one location/pin
+2	rho has been to the most all star games in baseballn4 </Li> <Li> Stan Musial 24 </Li>
+0In 1169, Ireland was invaded by which people?High King to ensure the terms of the Treaty of Windsor led Henry II, as King of England, to rule as effective monarch under the title of Lord of Ireland. This title was granted to his younger son but when Henry's heir unexpectedly died the title of King of England and Lord of Ireland became entwined in one
+1	What year did a biracial Populist fusion gain the Governors office?to the legislature and governor's office, but the Populists attracted voters displeased with them. In 1896 a biracial, Populist-Republican Fusionist coalition gained the governor's office. The Democrats regained control of the legislature
+1	nearest metro station to majnu ka tilla delhiRing Road of Delhi . It is at a walkable distance from ISBT Kashmere Gate . It is approachable through the Kashmeri Gate station of the Delhi Metro , lies on both the Red ( Dilshad Garden - Rithala ) and Yellow Lines ( Samaypur
+3	where is california located in the region of the united states<P> California is a U.S. state in the Pacific Region of the United States . With 39.5 million residents , California is the most populous state in the United States and the third largest by area . The
+1	when did the baptist church start in americacoworker for religious freedom , are variously credited as founding the earliest Baptist church in North America . In 1639 , Williams established a Baptist church in Providence , Rhode Island , and Clarke began a Baptist church in
+0	where was the first capital of the united states locatedpassed to pave the way for a permanent capital . The decision to locate the capital was contentious , but Alexander Hamilton helped broker a compromise in which the federal government would take on war debt incurred during the American Revolutionary War , in exchange for support from northern states for locating the capital along the Potomac
+0	What will new regulations will reduce?products off of the consumer market," said Michael Fry, director of conservation advocacy for the American Bird Conservancy. "By putting these restrictions in place, they are allowing a compromise to be made between themselves and organizations who have been working on this problem for a long time." The EPA's new measures, which were handed down Thursday, require that rat poisons be kept in bait stations above ground and in containers that meet agency standards. Loose bait, such as pellets, and the four most hazardous types of pesticides, known as "second-generation anticoagulants," will no longer be sold for personal use. Under the new restrictions, only farmers, livestock owners and certified rodent control employees will be allowed to purchase rat poison in bulk. Bags larger than 8 pounds will no longer be sold at hardware and home-improvement stores. Children who come into contact
+0	who played lois lane in the man of steelmixture of toughness and vulnerability , but Peter Bradshaw thought that the character was `` sketchily conceived '' and criticized her lack of chemistry with Cavill . Even so , the film earned over $660 million to become one of her biggest box
+0	What year did the writer of the 1968 novel "The Iron Man" become Poet Laurete?Giant is a 1999 American animated science-fiction comedy-drama action film using both traditional animation and computer animation, produced by and directed by Brad Bird in his directorial debut. It is based on the 1968 novel "The Iron Man" by Ted Hughes (which was published in the United States as "The Iron Giant") and was scripted by Tim McCanlies from a story treatment by Bird. The film stars the voices of Eli Marienthal,
+2	The conquest of Nice was an effort by Suleiman and what French king?allies. A month prior to the siege of Nice, France supported the Ottomans with an artillery unit during the 1543 Ottoman conquest of Esztergom in northern Hungary. After further advances by the Turks, the Habsburg ruler Ferdinand officially recognized Ottoman ascendancy in Hungary in
+0	when was the vaccine receivedfor swine flu, also known as 2009 H1N1, using reverse genetics, he said. "Suitable viruses will hopefully be sent to manufacturers by end of next week," Skinner wrote. Once that happens, vaccine makers will tweak the virus and have "pilot lots" of vaccine ready to be tested by mid- to late June. Several thousand cases have been reported
+1	What is the nationality of the actor who costarred with Matt LeBlanc in "All the Queen's Men"?n approximate -99.92% return. [PAR] [TLE] Eddie Izzard [SEP] Edward John "Eddie" Izzard ( ; born 7 February 1962) is an English stand-up comedian, actor, writer and political activist. His comedic style takes the form of rambling, whimsical monologue, and self-referential pantomime. He
+0	What sickened thousands of children?executives detained, a local official said, according to Xinhua, Initial tests showed more than 1,300 children in the Hunan province town of Wenping have excessive lead in their blood from the Wugang Manganese Smelting Plant. A second round of testing has been ordered to confirm the results. The plant opened in May 2008 without gaining the approval of the local environment protection bureau, said Huang Wenbin, a deputy environment chief in Wugang City, Xinhua reported. The plant was within 500 meters (about a quarter mile) of three schools. The
+0	What percentage of the population are the Kpelle?are descendants of African American and West Indian, mostly Barbadian settlers, make up 2.5%. Congo people, descendants of repatriated Congo and Afro-Caribbean
+1	Amount of people left homeless?86 dead, the state news agency said. About 30 people are missing, the official news agency Agencia Brasil said, citing civil defense officials. Earlier reports had indicated as many as 100 people were dead. In addition, more than 54,000 residents have been left homeless, and another 1.5 million have been affected by the heavy rains, the state news agency reported. Brazilian President Luiz Inacio Lula da Silva announced he will release nearly 700 million reais ($350 million)
+2	What other countries were in disagreement with the United Nations decision on Burma ?that strongly called upon the government of Myanmar to end its systematic violations of human rights. In January 2007, Russia and China vetoed a
+0	Besides Barcelona and Real Madrid, what other team has remained in the Primera Division?first football club to win six out of six competitions in a single year, completing the sextuple in also winning the Spanish Super Cup, UEFA Super Cup and FIFA Club World Cup. In 2011, the club became
+0	William Frederick Truax, is a former professional American football tight end in the National Football League (NFL) from 1964 to 1973 for the Los Angeles Rams and the Dallas Cowboys, following the 1970 NFL season, Truax was traded by the Rams to the Cowboys for wide receiver Lance Rentzel, a former American football flanker, in which organization?in New Orleans and college football at Louisiana State University and was drafted in the second round of the 1964 NFL draft. Following the 1970 NFL season, Truax was traded by the Rams to the Cowboys for wide receiver Lance Rentzel. He was part of the Cowboys' Super Bowl VI championship team in 1971. He played
+3	What year did Chopin learn that the uprising in Warsaw was crushed?enlist. Chopin, now alone in Vienna, was nostalgic for his homeland, and wrote to a friend, "I curse the moment of my departure." When in September 1831 he learned, while travelling from Vienna to Paris, that the uprising had been crushed, he expressed his anguish in the pages of his private journal: "Oh
+1	where do they make money in washington dc; all coinage is produced by the United States Mint . With production facilities in Washington , DC , and Fort Worth , Texas , the Bureau of Engraving and Printing is the largest producer of government security documents in the United States . </P>
+0	What did a researcher compare this process to?which makes it one of the highest rates of maternal mortality in the Americas. In wealthy developed nations, only nine women die for every 100,000 births. The five main causes of pregnancy-related deaths in Peru are hemorrhage, pre-eclampsia, infection, complications following abortion and obstructed birth, according to Peru's Ministry of Health figures. Amnesty's Peru researcher Nuria Garcia said, in a written statement: "The rates of maternal mortality in Peru are scandalous. The fact that so many women are dying from preventable causes is a human rights violation. "The Peruvian state is simply ignoring
+0	How many containers can Longtan Containers Port Area handle?Port of Nanjing is the largest inland port in China, with annual cargo tonnage reached 191,970,000 t in 2012. The port area is 98 kilometres (61 mi) in length and has 64 berths
+0	The 2011 New York City Marathon was sponsored by which Dutch multinational banking corporation?are retail banking, direct banking, commercial banking, investment banking, asset management, and insurance services. ING is an abbreviation for "Internationale Nederlanden Groep " (English: International Netherlands Group). [PAR] [TLE] 2011 New York City Marathon [SEP] The 42nd New York City Marathon took
+0	What is human flourishing?it does not involve believing that human nature is purely good or that all people can live up to the Humanist ideals without help. If anything, there is recognition that living up to one's potential is hard
+0	What was the result of Dida appealto play in next month's Champions League match at Shakhtar Donetsk after partially winning his appeal to UEFA against a two-match ban. Dida has had one game of his two-match ban suspended for a year following an appeal to UEFA. Brazilian Dida was also fined 60,000 Swiss francs by European football's ruling body following an incident involving a supporter during the Champions clash against Celtic in Scotland on October 3. The 34-year-old Brazilian was initially banned for two games for his theatrics following a Celtic fan's encroachment onto the pitch during the 2-1 defeat at Celtic
+1	What is more plentiful in capital projects?generates economic distortion in the public sector by diverting public investment into capital projects where bribes and kickbacks are more plentiful. Officials may increase the technical complexity of public sector projects to conceal or
+0	where were band greeted with cheers?the United States for a show in Stamford, Connecticut, on Tuesday, after they have "a few days off to recuperate," Robinson said. The trio was the opening act for Nelson until they were loudly booed in Toronto, a day after the actor-musician's bizarre interview with a CBC radio host. Ironically, the comments that offended Canadians included Thornton's assessment that they were "very reserved" and "it doesn't matter what you say to them." "It's mashed potatoes with no gravy," Thornton told CBC host Jian Ghomeshi. "We tend to play places where people throw things at each other and here they just sort of sit there," he said. Watch Thornton's interview » The audience at Thursday night's show in Toronto loudly booed the Boxmasters, with some shouts of "Here comes the gravy!" The Toronto Star newspaper reported. Thornton's remarks about
+0	What do Mexicans call Mexico City?the Federal District in Spanish: D.F., which is read "De-Efe"). They are formally called capitalinos (in reference to the city being the capital of the country), but "[p]erhaps because capitalino is the
+0	where does lock stock and barrel come fromindividual components one at a time . One craftsman made the `` lock '' which would have been a `` match lock '' , `` wheel lock '' , `` flint lock '' etc .
+1	who has the power to establish a prison system<P> The Federal Bureau of Prisons ( BOP ) is a United States federal law enforcement agency . A subdivision of
+0	what are south americas only 2 landlocked countriessuch countries , including five partially recognised states .
--- a/data/match4mrqa/train.txt
+++ b/data/match4mrqa/train.txt
--- a/data/mlm4mrqa/mlm.txt.gz
+++ b/data/mlm4mrqa/mlm.txt.gz
--- a/data/mlm4mrqa/train.txt
+++ b/data/mlm4mrqa/train.txt
--- a/data/mrqa/mrqa-combined.dev.raw.json
+++ b/data/mrqa/mrqa-combined.dev.raw.json
--- a/data/mrqa/mrqa-combined.train.raw.json
+++ b/data/mrqa/mrqa-combined.train.raw.json
--- a/data/mrqa/train.json
+++ b/data/mrqa/train.json
--- a/demo1.py
+++ b/demo1.py
@@ -5,6 +5,3 @@ if __name__ == '__main__':
    controller.load_pretrain('pretrain_model/bert/params')
    controller.train()
-    controller = palm.Controller(config='config_demo1.yaml', task_dir='demo1_tasks', for_train=False)
-    controller.pred('mrqa', inference_model_dir='output_model/firstrun/infer_model')
--- a/demo1_tasks/mrqa.yaml
+++ b/demo1_tasks/mrqa.yaml
-train_file: data/mrqa/mrqa-combined.train.raw.json
+train_file: data/mrqa/train.json
-pred_file: data/mrqa/mrqa-combined.dev.raw.json
+reader: mrc
-pred_output_path: 'mrqa_output'
-reader: mrc4ernie
 paradigm: mrc
+vocab_path: "pretrain_model/bert/vocab.txt"
+do_lower_case: True
+max_seq_len: 512
 doc_stride: 128
 max_query_len: 64
-max_answer_len: 30
-n_best_size: 20
-null_score_diff_threshold: 0.0
-verbose: False
--- a/demo2.py
+++ b/demo2.py
@@ -3,8 +3,9 @@ import paddlepalm as palm
 if __name__ == '__main__':
    controller = palm.Controller('config_demo2.yaml', task_dir='demo2_tasks')
    controller.load_pretrain('pretrain_model/ernie/params')
-    # controller.train()
+    controller.train()
    # controller = palm.Controller(config='config_demo2.yaml', task_dir='demo2_tasks', for_train=False)
    # controller.pred('mrqa', inference_model_dir='output_model/secondrun/infer_model')
--- a/demo2_tasks/match4mrqa.yaml
+++ b/demo2_tasks/match4mrqa.yaml
-train_file: "data/match4mrqa/train.txt"
+train_file: "data/match4mrqa/train.tsv"
-reader: match4ernie
+reader: match
 paradigm: match
--- a/demo2_tasks/mlm4mrqa.yaml
+++ b/demo2_tasks/mlm4mrqa.yaml
-train_file: "data/mlm4mrqa/train.txt"
+train_file: "data/mlm4mrqa/train.tsv"
 reader: mlm
 paradigm: mlm
--- a/demo2_tasks/mrqa.yaml
+++ b/demo2_tasks/mrqa.yaml
-train_file: data/mrqa/mrqa-combined.train.raw.json
+train_file: data/mrqa/train.json
-pred_file: data/mrqa/mrqa-combined.dev.raw.json
+pred_file: data/mrqa/dev.json
 pred_output_path: 'mrqa_output'
-reader: mrc4ernie
+reader: mrc
 paradigm: mrc
 doc_stride: 128
 max_query_len: 64

--- a/demo3.py
+++ b/demo3.py
+import paddlepalm as palm
+if __name__ == '__main__':
+    controller = palm.Controller('config_demo3.yaml', task_dir='demo3_tasks')
+    controller.load_pretrain('pretrain_model/ernie/params')
+    controller.train()
+    controller = palm.Controller(config='config_demo3.yaml', task_dir='demo3_tasks', for_train=False)
+    controller.pred('cls4mrqa', inference_model_dir='output_model/thirdrun/infer_model')
--- a/demo3_tasks/mrqa.yaml
+++ b/demo3_tasks/mrqa.yaml
+train_file: data/cls4mrqa/train.tsv
+reader: cls
+paradigm: cls
--- a/nohup.out
+++ b/nohup.out
+W1028 21:51:59.319365  9630 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 9.0
+W1028 21:51:59.323333  9630 device_context.cc:243] device: 0, cuDNN Version: 7.3.
+I1028 21:52:26.817137  9630 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 8. And the Program will be copied 8 copies
+W1028 21:52:41.982228  9630 fuse_all_reduce_op_pass.cc:72] Find all_reduce operators: 401. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 255.
+I1028 21:52:42.243458  9630 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
+I1028 21:53:14.242537  9630 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
+I1028 21:53:16.313246  9630 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
+/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/executor.py:774: UserWarning: The following exception is not an EOF exception.
+  "The following exception is not an EOF exception.")
+Traceback (most recent call last):
+  File "demo2.py", line 6, in <module>
+    controller.train()
+  File "/home/ssd7/yiming/release/PALM/paddlepalm/mtl_controller.py", line 669, in train
+    fluid.io.save_persistables(self.exe, save_path, saver_program)
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/io.py", line 571, in save_persistables
+    filename=filename)
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/io.py", line 216, in save_vars
+    filename=filename)
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/io.py", line 256, in save_vars
+    executor.run(save_program)
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/executor.py", line 775, in run
+    six.reraise(*sys.exc_info())
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/executor.py", line 770, in run
+    use_program_cache=use_program_cache)
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/executor.py", line 817, in _run_impl
+    use_program_cache=use_program_cache)
+  File "/home/zhangyiming/env-bert/lib/python2.7/site-packages/paddle/fluid/executor.py", line 894, in _run_program
+    fetch_var_name)
+paddle.fluid.core_avx.EnforceNotMet: 
+--------------------------------------------
+C++ Call Stacks (More useful to developers):
+--------------------------------------------
+0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
+1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
+2   paddle::operators::SaveOpKernel<paddle::platform::CUDADeviceContext, float>::SaveLodTensor(paddle::framework::ExecutionContext const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::Variable const*) const
+3   paddle::operators::SaveOpKernel<paddle::platform::CUDADeviceContext, float>::Compute(paddle::framework::ExecutionContext const&) const
+4   std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::SaveOpKernel<paddle::platform::CUDADeviceContext, float>, paddle::operators::SaveOpKernel<paddle::platform::CUDADeviceContext, double>, paddle::operators::SaveOpKernel<paddle::platform:I1029 10:38:26.419725 30194 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 8. And the Program will be copied 8 copies
+W1029 10:38:48.046470 30194 fuse_all_reduce_op_pass.cc:72] Find all_reduce operators: 401. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 255.
+I1029 10:38:48.322405 30194 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
+I1029 10:39:23.302821 30194 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
+I1029 10:39:25.419924 30194 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
+W1029 10:42:46.438006 30194 init.cc:212] *** Aborted at 1572316966 (unix time) try "date -d @1572316966" if you are using GNU date ***
+W1029 10:42:46.440183 30194 init.cc:212] PC: @                0x0 (unknown)
+W1029 10:42:46.440296 30194 init.cc:212] *** SIGTERM (@0x1f80000785a) received by PID 30194 (TID 0x7f0773d5e700) from PID 30810; stack trace: ***
+W1029 10:42:46.441951 30194 init.cc:212]     @     0x7f0773528160 (unknown)
+W1029 10:42:46.443789 30194 init.cc:212]     @     0x7f07735243cc __pthread_cond_wait
+W1029 10:42:46.444838 30194 init.cc:212]     @     0x7f0726a0c3cc std::condition_variable::wait()
+W1029 10:42:46.449384 30194 init.cc:212]     @     0x7f070292290d paddle::framework::details::FastThreadedSSAGraphExecutor::Run()
+W1029 10:42:46.450734 30194 init.cc:212]     @     0x7f07028836a7 _ZNSt17_Function_handlerIFvvEZN6paddle9framework7details29ScopeBufferedSSAGraphExecutor3RunERKSt6vectorISsSaISsEEEUlvE_E9_M_invokeERKSt9_Any_data
+W1029 10:42:46.454063 30194 init.cc:212]     @     0x7f07028884bf paddle::framework::details::ScopeBufferedMonitor::Apply()
+W1029 10:42:46.455735 30194 init.cc:212]     @     0x7f0702883e86 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run()
+W1029 10:42:46.458518 30194 init.cc:212]     @     0x7f0700626038 paddle::framework::ParallelExecutor::Run()
+W1029 10:42:46.459216 30194 init.cc:212]     @     0x7f0700409e78 _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework16ParallelExecutorERKSt6vectorISsSaISsEEE188_S9_INS6_9LoDTensorESaISF_EEIS8_SD_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESY_
+W1029 10:42:46.460702 30194 init.cc:212]     @     0x7f0700453f56 pybind11::cpp_function::dispatcher()
+W1029 10:42:46.462498 30194 init.cc:212]     @     0x7f0773841cc8 PyEval_EvalFrameEx
+W1029 10:42:46.464206 30194 init.cc:212]     @     0x7f077384435d PyEval_EvalCodeEx
+W1029 10:42:46.465894 30194 init.cc:212]     @     0x7f0773841d50 PyEval_EvalFrameEx
+W1029 10:42:46.467593 30194 init.cc:212]     @     0x7f077384435d PyEval_EvalCodeEx
+W1029 10:42:46.469327 30194 init.cc:212]     @     0x7f0773841d50 PyEval_EvalFrameEx
+W1029 10:42:46.471053 30194 init.cc:212]     @     0x7f077384435d PyEval_EvalCodeEx
+W1029 10:42:46.472759 30194 init.cc:212]     @     0x7f0773841d50 PyEval_EvalFrameEx
+W1029 10:42:46.474479 30194 init.cc:212]     @     0x7f077384435d PyEval_EvalCodeEx
+W1029 10:42:46.476193 30194 init.cc:212]     @     0x7f0773841d50 PyEval_EvalFrameEx
+W1029 10:42:46.477926 30194 init.cc:212]     @     0x7f077384435d PyEval_EvalCodeEx
+W1029 10:42:46.479651 30194 init.cc:212]     @     0x7f0773844492 PyEval_EvalCode
+W1029 10:42:46.481353 30194 init.cc:212]     @     0x7f077386e1a2 PyRun_FileExFlags
+W1029 10:42:46.483080 30194 init.cc:212]     @     0x7f077386f539 PyRun_SimpleFileExFlags
+W1029 10:42:46.484786 30194 init.cc:212]     @     0x7f07738851bd Py_Main
+W1029 10:42:46.487162 30194 init.cc:212]     @     0x7f0772a82bd5 __libc_start_main
+W1029 10:42:46.487229 30194 init.cc:212]     @           0x4007a1 (unknown)
+W1029 10:42:46.488940 30194 init.cc:212]     @                0x0 (unknown)
+./run_demo2.sh: line 5: 30194 Terminated              python demo2.py >> demo2.log
+W1029 10:43:27.495725 32687 device_context.cc:235] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 9.0
+W1029 10:43:27.500324 32687 device_context.cc:243] device: 0, cuDNN Version: 7.3.
+I1029 10:43:41.409127 32687 parallel_executor.cc:421] The number of CUDAPlace, which is used in ParallelExecutor, is 8. And the Program will be copied 8 copies
+W1029 10:44:03.299010 32687 fuse_all_reduce_op_pass.cc:72] Find all_reduce operators: 401. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 255.
+I1029 10:44:03.584228 32687 build_strategy.cc:363] SeqOnlyAllReduceOps:0, num_trainers:1
+I1029 10:44:39.690382 32687 parallel_executor.cc:285] Inplace strategy is enabled, when build_strategy.enable_inplace = True
+I1029 10:44:42.244774 32687 parallel_executor.cc:368] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
+W1029 10:48:20.253201 32687 init.cc:212] *** Aborted at 1572317300 (unix time) try "date -d @1572317300" if you are using GNU date ***
+W1029 10:48:20.255347 32687 init.cc:212] PC: @                0x0 (unknown)
+W1029 10:48:20.255458 32687 init.cc:212] *** SIGTERM (@0x1f80000785a) received by PID 32687 (TID 0x7f0f71d25700) from PID 30810; stack trace: ***
+W1029 10:48:20.257107 32687 init.cc:212]     @     0x7f0f714ef160 (unknown)
+W1029 10:48:20.258708 32687 init.cc:212]     @     0x7f0f714eb3cc __pthread_cond_wait
+W1029 10:48:20.259734 32687 init.cc:212]     @     0x7f0f249d33cc std::condition_variable::wait()
+W1029 10:48:20.263964 32687 init.cc:212]     @     0x7f0f008e990d paddle::framework::details::FastThreadedSSAGraphExecutor::Run()
+W1029 10:48:20.265229 32687 init.cc:212]     @     0x7f0f0084a6a7 _ZNSt17_Function_handlerIFvvEZN6paddle9framework7details29ScopeBufferedSSAGraphExecutor3RunERKSt6vectorISsSaISsEEEUlvE_E9_M_invokeERKSt9_Any_data
+W1029 10:48:20.268503 32687 init.cc:212]     @     0x7f0f0084f4bf paddle::framework::details::ScopeBufferedMonitor::Apply()
+W1029 10:48:20.270135 32687 init.cc:212]     @     0x7f0f0084ae86 paddle::framework::details::ScopeBufferedSSAGraphExecutor::Run()
+W1029 10:48:20.272866 32687 init.cc:212]     @     0x7f0efe5ed038 paddle::framework::ParallelExecutor::Run()
+W1029 10:48:20.273551 32687 init.cc:212]     @     0x7f0efe3d0e78 _ZZN8pybind1112cpp_function10initializeIZN6paddle6pybindL22pybind11_init_core_avxERNS_6moduleEEUlRNS2_9framework16ParallelExecutorERKSt6vectorISsSaISsEEE188_S9_INS6_9LoDTensorESaISF_EEIS8_SD_EINS_4nameENS_9is_methodENS_7siblingEEEEvOT_PFT0_DpT1_EDpRKT2_ENUlRNS_6detail13function_callEE1_4_FUNESY_
+W1029 10:48:20.274988 32687 init.cc:212]     @     0x7f0efe41af56 pybind11::cpp_function::dispatcher()
+W1029 10:48:20.276706 32687 init.cc:212]     @     0x7f0f71808cc8 PyEval_EvalFrameEx
+W1029 10:48:20.278395 32687 init.cc:212]     @     0x7f0f7180b35d PyEval_EvalCodeEx
+W1029 10:48:20.280076 32687 init.cc:212]     @     0x7f0f71808d50 PyEval_EvalFrameEx
+W1029 10:48:20.281765 32687 init.cc:212]     @     0x7f0f7180b35d PyEval_EvalCodeEx
+W1029 10:48:20.283442 32687 init.cc:212]     @     0x7f0f71808d50 PyEval_EvalFrameEx
+W1029 10:48:20.285133 32687 init.cc:212]     @     0x7f0f7180b35d PyEval_EvalCodeEx
+W1029 10:48:20.286808 32687 init.cc:212]     @     0x7f0f71808d50 PyEval_EvalFrameEx
+W1029 10:48:20.288502 32687 init.cc:212]     @     0x7f0f7180b35d PyEval_EvalCodeEx
+W1029 10:48:20.290176 32687 init.cc:212]     @     0x7f0f71808d50 PyEval_EvalFrameEx
+W1029 10:48:20.291870 32687 init.cc:212]     @     0x7f0f7180b35d PyEval_EvalCodeEx
+W1029 10:48:20.293542 32687 init.cc:212]     @     0x7f0f7180b492 PyEval_EvalCode
+W1029 10:48:20.295228 32687 init.cc:212]     @     0x7f0f718351a2 PyRun_FileExFlags
+W1029 10:48:20.296922 32687 init.cc:212]     @     0x7f0f71836539 PyRun_SimpleFileExFlags
+W1029 10:48:20.298590 32687 init.cc:212]     @     0x7f0f7184c1bd Py_Main
+W1029 10:48:20.300307 32687 init.cc:212]     @     0x7f0f70a49bd5 __libc_start_main
+W1029 10:48:20.300364 32687 init.cc:212]     @           0x4007a1 (unknown)
+W1029 10:48:20.302006 32687 init.cc:212]     @                0x0 (unknown)
--- a/paddlepalm/backbone/utils/transformer.py
+++ b/paddlepalm/backbone/utils/transformer.py
@@ -23,6 +23,35 @@ from functools import partial
 import paddle.fluid as fluid
 import paddle.fluid.layers as layers
+from paddle.fluid.layer_helper import LayerHelper as LayerHelper
+def layer_norm(x, begin_norm_axis=1, epsilon=1e-6, param_attr=None, bias_attr=None):
+    helper = LayerHelper('layer_norm', **locals())
+    mean = layers.reduce_mean(x, dim=begin_norm_axis, keep_dim=True)
+    shift_x = layers.elementwise_sub(x=x, y=mean, axis=0)
+    variance = layers.reduce_mean(layers.square(shift_x), dim=begin_norm_axis, keep_dim=True)
+    r_stdev = layers.rsqrt(variance + epsilon)
+    norm_x = layers.elementwise_mul(x=shift_x, y=r_stdev, axis=0)
+    param_shape = [reduce(lambda x, y: x * y, norm_x.shape[begin_norm_axis:])]
+    param_dtype = norm_x.dtype
+    scale = helper.create_parameter(
+        attr=param_attr,
+        shape=param_shape,
+        dtype=param_dtype,
+        default_initializer=fluid.initializer.Constant(1.))
+    bias = helper.create_parameter(
+        attr=bias_attr,
+        shape=param_shape,
+        dtype=param_dtype,
+        is_bias=True,
+        default_initializer=fluid.initializer.Constant(0.))
+    out = layers.elementwise_mul(x=norm_x, y=scale, axis=-1)
+    out = layers.elementwise_add(x=out, y=bias, axis=-1)
+    return out
 def multi_head_attention(queries,
                         keys,
                         values,
@@ -209,7 +238,7 @@ def pre_post_process_layer(prev_out, out, process_cmd, dropout_rate=0.,
            out_dtype = out.dtype
            if out_dtype == fluid.core.VarDesc.VarType.FP16:
                out = layers.cast(x=out, dtype="float32")
-            out = layers.layer_norm(
+            out = layer_norm(
                out,
                begin_norm_axis=len(out.shape) - 1,
                param_attr=fluid.ParamAttr(

--- a/paddlepalm/reader/match4ernie.py
+++ b/paddlepalm/reader/match4ernie.py
@@ -36,6 +36,8 @@ class Reader(reader):
        self._batch_size = config['batch_size']
        self._max_seq_len = config['max_seq_len']
+        self._num_classes = config['n_classes']
        if phase == 'train':
            self._input_file = config['train_file']
            self._num_epochs = None # 防止iteartor终止
@@ -91,6 +93,7 @@ class Reader(reader):
            return outputs
        for batch in self._data_generator():
+            print(batch)
            yield list_to_dict(batch)
    def get_epoch_outputs(self):

--- a/paddlepalm/reader/cls4bert.py
+++ b/paddlepalm/reader/cls4bert.py
--- a/paddlepalm/reader/mlm.py
+++ b/paddlepalm/reader/mlm.py
@@ -15,6 +15,7 @@
 from paddlepalm.interface import reader
 from paddlepalm.reader.utils.reader4ernie import MaskLMReader
+import numpy as np
 class Reader(reader):
@@ -81,6 +82,8 @@ class Reader(reader):
            return outputs
        for batch in self._data_generator():
+            # print(np.shape(list_to_dict(batch)['token_ids']))
+            # print(list_to_dict(batch)['mask_label'].tolist())
            yield list_to_dict(batch)
    def get_epoch_outputs(self):

--- a/paddlepalm/reader/mrc4ernie.py
+++ b/paddlepalm/reader/mrc4ernie.py
--- a/paddlepalm/reader/mrc4bert.py
+++ b/paddlepalm/reader/mrc4bert.py
-# -*- coding: UTF-8 -*-
-#   Copyright (c) 2019 PaddlePaddle Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-#     http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-from __future__ import absolute_import
-from paddlepalm.interface import reader
-from paddlepalm.utils.textprocess_helper import is_whitespace
-from paddlepalm.reader.utils.mrqa_helper import MRQAExample, MRQAFeature
-import paddlepalm.tokenizer.bert_tokenizer as tokenization
-class Reader(reader):
-    def __init__(self, config, phase='train', dev_count=1, print_prefix=''):
-        """
-        Args:
-            phase: train, eval, pred
-            """
-        self._is_training = phase == 'train'
-        self._tokenizer = tokenization.FullTokenizer(
-            vocab_file=config['vocab_path'], do_lower_case=config.get('do_lower_case', False))
-        self._max_seq_length = config['max_seq_len']
-        self._doc_stride = config['doc_stride']
-        self._max_query_length = config['max_query_len']
-        if phase == 'train':
-            self._input_file = config['train_file']
-            self._num_epochs = config['num_epochs']
-            self._shuffle = config.get('shuffle', False)
-            self._shuffle_buffer = config.get('shuffle_buffer', 5000)
-        if phase == 'eval':
-            self._input_file = config['dev_file']
-            self._num_epochs = 1
-            self._shuffle = False
-        elif phase == 'pred':
-            self._input_file = config['predict_file']
-            self._num_epochs = 1
-            self._shuffle = False
-        # self._batch_size = 
-        self._batch_size = config['batch_size']
-        self._pred_batch_size = config.get('pred_batch_size', self._batch_size)
-        self._print_first_n = config.get('print_first_n', 1)
-        self._with_negative = config.get('with_negative', False)
-        self._sample_rate = config.get('sample_rate', 0.02)
-        # TODO: without slide window version
-        self._with_slide_window = config.get('with_slide_window', False)
-        self.vocab = self._tokenizer.vocab
-        self.vocab_size = len(self.vocab)
-        self.pad_id = self.vocab["[PAD]"]
-        self.cls_id = self.vocab["[CLS]"]
-        self.sep_id = self.vocab["[SEP]"]
-        self.mask_id = self.vocab["[MASK]"]
-        self.current_train_example = -1
-        self.num_train_examples = -1
-        self.current_train_epoch = -1
-        self.n_examples = None
-        print(print_prefix + 'reading raw data...')
-        with open(input_file, "r") as reader:
-            self.raw_data = json.load(reader)["data"]
-        print(print_prfix + 'done!')
-    @property
-    def outputs_attr(self):
-        if self._is_training:
-            return {"token_ids": [[-1, self.max_seq_len, 1], 'int64'],
-                    "position_ids": [[-1, self.max_seq_len, 1], 'int64'],
-                    "segment_ids": [[-1, self.max_seq_len, 1], 'int64'],
-                    "input_mask": [[-1, self.max_seq_len, 1], 'float32'],
-                    "start_positions": [[-1, self.max_seq_len, 1], 'int64'],
-                    "end_positions": [[-1, self.max_seq_len, 1], 'int64']
-                    }
-        else:
-            return {"token_ids": [[-1, self.max_seq_len, 1], 'int64'],
-                    "position_ids": [[-1, self.max_seq_len, 1], 'int64'],
-                    "segment_ids": [[-1, self.max_seq_len, 1], 'int64'],
-                    "input_mask": [[-1, self.max_seq_len, 1], 'float32'],
-                    "unique_ids": [[-1, 1], 'int64']
-                    }
-    def iterator(self): 
-        features = []
-        for i in self._num_epochs:
-            if self._is_training:
-                print(self.print_prefix + '{} epoch {} {}'.format('-'*16, i, '-'*16))
-            example_id = 0
-            feature_id = 1000000000
-            for line in self.train_file:
-                raw = self.parse_line(line)
-                examples = _raw_to_examples(raw['context'], raw['qa_list'], is_training=self._is_training)
-                for example in examples:
-                    features.extend(_example_to_features(example, example_id, self._tokenizer, \
-                                        self._max_seq_length, self._doc_stride, self._max_query_length, \
-                                        id_offset=1000000000+len(features), is_training=self._is_training))
-                    if len(features) >= self._batch_size * self._dev_count:
-                        for batch, total_token_num in _features_to_batches( \
-                                                        features[:self._batch_size * self._dev_count], \
-                                                        batch_size, in_tokens=self._in_tokens):
-                            temp = prepare_batch_data(batch, total_token_num, \
-                                    max_len=self._max_seq_length, voc_size=-1, \
-                                    pad_id=self.pad_id, cls_id=self.cls_id, sep_id=self.sep_id, mask_id=-1, \
-                                    return_input_mask=True, return_max_len=False, return_num_token=False)
-                            if self._is_training:
-                                tok_ids, pos_ids, seg_ids, input_mask, start_positions, end_positions = temp
-                                yield {"token_ids": tok_ids, "position_ids": pos_ids, "segment_ids": seg_ids, "input_mask": input_mask, "start_positions": start_positions, 'end_positions': end_positions}
-                            else:
-                                tok_ids, pos_ids, seg_ids, input_mask, unique_ids = temp
-                                yield {"token_ids": tok_ids, "position_ids": pos_ids, "segment_ids": seg_ids, "input_mask": input_mask, "unique_ids": unique_ids}
-                        features = features[self._batch_size * self._dev_count:]
-                    example_id += 1
-        # The last batch may be discarded when running with distributed prediction, so we build some fake batches for the last prediction step.
-        if self._is_training and len(features) > 0:
-            pred_batches = []
-            for batch, total_token_num in _features_to_batches( \
-                                            features[:self._batch_size * self._dev_count], \
-                                            batch_size, in_tokens=self._in_tokens):
-                pred_batches.append(prepare_batch_data(batch, total_token_num, max_len=self._max_seq_length, voc_size=-1,
-                                        pad_id=self.pad_id, cls_id=self.cls_id, sep_id=self.sep_id, mask_id=-1, \
-                                        return_input_mask=True, return_max_len=False, return_num_token=False))
-            fake_batch = pred_batches[-1]
-            fake_batch = fake_batch[:-1] + [np.array([-1]*len(fake_batch[0]))]
-            pred_batches = pred_batches + [fake_batch] * (dev_count - len(pred_batches))
-            for batch in pred_batches:
-                yield batch
-    @property
-    def num_examples(self):
-        if self.n_examples is None:
-            self.n_examples = _estimate_runtime_examples(self.raw_data, self._sample_rate, self._tokenizer, \
-                                  self._max_seq_length, self._doc_stride, self._max_query_length, \
-                                  remove_impossible_questions=True, filter_invalid_spans=True)
-        return self.n_examples
-        # return math.ceil(n_examples * self._num_epochs / float(self._batch_size * self._dev_count))
-def _raw_to_examples(context, qa_list, is_training=True, remove_impossible_questions=True, filter_invalid_spans=True):
-    """
-    Args:
-        context: (str) the paragraph that provide information for QA
-        qa_list: (list) nested dict. Each element in qa_list should contain at least 'id' and 'question'. And the ....
-        """
-    examples = []
-    doc_tokens = []
-    char_to_word_offset = []
-    prev_is_whitespace = True
-    for c in context:
-        if is_whitespace(c):
-            prev_is_whitespace = True
-        else:
-            if prev_is_whitespace:
-                doc_tokens.append(c)
-            else:
-                doc_tokens[-1] += c
-            prev_is_whitespace = False
-        char_to_word_offset.append(len(doc_tokens) - 1)
-    for qa in qa_list:
-        qas_id = qa["id"]
-        question_text = qa["question"]
-        start_position = None
-        end_position = None
-        orig_answer_text = None
-        is_impossible = False
-        if is_training:
-            assert len(qa["answers"]) == 1, "For training, each question should have exactly 1 answer."
-            if ('is_impossible' in qa) and (qa["is_impossible"]):
-                if remove_impossible_questions or filter_invalid_spans:
-                    continue
-                else:
-                    start_position = -1
-                    end_position = -1
-                    orig_answer_text = ""
-                    is_impossible = True
-            else:
-                answer = qa["answers"][0]
-                orig_answer_text = answer["text"]
-                answer_offset = answer["answer_start"]
-                answer_length = len(orig_answer_text)
-                start_position = char_to_word_offset[answer_offset]
-                end_position = char_to_word_offset[answer_offset +
-                                                   answer_length - 1]
-                # remove corrupt samples
-                actual_text = " ".join(doc_tokens[start_position:(
-                    end_position + 1)])
-                cleaned_answer_text = " ".join(
-                    tokenization.whitespace_tokenize(orig_answer_text))
-                if actual_text.find(cleaned_answer_text) == -1:
-                    print(self.print_prefix + "Could not find answer: '%s' vs. '%s'",
-                          actual_text, cleaned_answer_text)
-                    continue
-        examples.append(MRQAExample(
-            qas_id=qas_id,
-            question_text=question_text,
-            doc_tokens=doc_tokens,
-            orig_answer_text=orig_answer_text,
-            start_position=start_position,
-            end_position=end_position,
-            is_impossible=is_impossible))
-    return examples
-def _example_to_features(example, example_id, tokenizer, max_seq_length, doc_stride, max_query_length, id_offset, is_training):
-    query_tokens = tokenizer.tokenize(example.question_text)
-    if len(query_tokens) > max_query_length:
-        query_tokens = query_tokens[0:max_query_length]
-    tok_to_orig_index = []
-    orig_to_tok_index = []
-    all_doc_tokens = []
-    for (i, token) in enumerate(example.doc_tokens):
-        orig_to_tok_index.append(len(all_doc_tokens))
-        sub_tokens = tokenizer.tokenize(token)
-        for sub_token in sub_tokens:
-            tok_to_orig_index.append(i)
-            all_doc_tokens.append(sub_token)
-    tok_start_position = None
-    tok_end_position = None
-    if is_training and example.is_impossible:
-        tok_start_position = -1
-        tok_end_position = -1
-    if is_training and not example.is_impossible:
-        tok_start_position = orig_to_tok_index[example.start_position]
-        if example.end_position < len(example.doc_tokens) - 1:
-            tok_end_position = orig_to_tok_index[example.end_position +
-                                                 1] - 1
-        else:
-            tok_end_position = len(all_doc_tokens) - 1
-        (tok_start_position, tok_end_position) = _improve_answer_span(
-            all_doc_tokens, tok_start_position, tok_end_position, tokenizer,
-            example.orig_answer_text)
-    # The -3 accounts for [CLS], [SEP] and [SEP]
-    max_tokens_for_doc = max_seq_length - len(query_tokens) - 3
-    # We can have documents that are longer than the maximum sequence length.
-    # To deal with this we do a sliding window approach, where we take chunks
-    # of the up to our max length with a stride of `doc_stride`.
-    _DocSpan = collections.namedtuple(  # pylint: disable=invalid-name
-        "DocSpan", ["start", "length"])
-    doc_spans = []
-    start_offset = 0
-    while start_offset < len(all_doc_tokens):
-        length = len(all_doc_tokens) - start_offset
-        if length > max_tokens_for_doc:
-            length = max_tokens_for_doc
-        doc_spans.append(_DocSpan(start=start_offset, length=length))
-        if start_offset + length == len(all_doc_tokens):
-            break
-        start_offset += min(length, doc_stride)
-    for (doc_span_index, doc_span) in enumerate(doc_spans):
-        tokens = []
-        token_to_orig_map = {}
-        token_is_max_context = {}
-        segment_ids = []
-        tokens.append("[CLS]")
-        segment_ids.append(0)
-        for token in query_tokens:
-            tokens.append(token)
-            segment_ids.append(0)
-        tokens.append("[SEP]")
-        segment_ids.append(0)
-        for i in range(doc_span.length):
-            split_token_index = doc_span.start + i
-            token_to_orig_map[len(tokens)] = tok_to_orig_index[
-                split_token_index]
-            is_max_context = _check_is_max_context(
-                doc_spans, doc_span_index, split_token_index)
-            token_is_max_context[len(tokens)] = is_max_context
-            tokens.append(all_doc_tokens[split_token_index])
-            segment_ids.append(1)
-        tokens.append("[SEP]")
-        segment_ids.append(1)
-        input_ids = tokenizer.convert_tokens_to_ids(tokens)
-        # The mask has 1 for real tokens and 0 for padding tokens. Only real
-        # tokens are attended to.
-        input_mask = [1] * len(input_ids)
-        # Zero-pad up to the sequence length.
-        #while len(input_ids) < max_seq_length:
-        #  input_ids.append(0)
-        #  input_mask.append(0)
-        #  segment_ids.append(0)
-        #assert len(input_ids) == max_seq_length
-        #assert len(input_mask) == max_seq_length
-        #assert len(segment_ids) == max_seq_length
-        start_position = None
-        end_position = None
-        if is_training and not example.is_impossible:
-            # For training, if our document chunk does not contain an annotation
-            # we throw it out, since there is nothing to predict.
-            doc_start = doc_span.start
-            doc_end = doc_span.start + doc_span.length - 1
-            out_of_span = False
-            if not (tok_start_position >= doc_start and
-                    tok_end_position <= doc_end):
-                out_of_span = True
-            if out_of_span:
-                start_position = 0
-                end_position = 0
-                continue
-            else:
-                doc_offset = len(query_tokens) + 2
-                start_position = tok_start_position - doc_start + doc_offset
-                end_position = tok_end_position - doc_start + doc_offset
-        if is_training and example.is_impossible:
-            start_position = 0
-            end_position = 0
-        def format_print():
-            print("*** Example ***")
-            print("unique_id: %s" % (unique_id))
-            print("example_index: %s" % (example_index))
-            print("doc_span_index: %s" % (doc_span_index))
-            print("tokens: %s" % " ".join(
-                [tokenization.printable_text(x) for x in tokens]))
-            print("token_to_orig_map: %s" % " ".join([
-                "%d:%d" % (x, y)
-                for (x, y) in six.iteritems(token_to_orig_map)
-            ]))
-            print("token_is_max_context: %s" % " ".join([
-                "%d:%s" % (x, y)
-                for (x, y) in six.iteritems(token_is_max_context)
-            ]))
-            print("input_ids: %s" % " ".join([str(x) for x in input_ids]))
-            print("input_mask: %s" % " ".join([str(x) for x in input_mask]))
-            print("segment_ids: %s" %
-                  " ".join([str(x) for x in segment_ids]))
-            if is_training and example.is_impossible:
-                print("impossible example")
-            if is_training and not example.is_impossible:
-                answer_text = " ".join(tokens[start_position:(end_position +
-                                                              1)])
-                print("start_position: %d" % (start_position))
-                print("end_position: %d" % (end_position))
-                print("answer: %s" %
-                      (tokenization.printable_text(answer_text)))
-        if self._print_first_n > 0:
-            format_print()
-            self._print_first_n -= 1
-        features.append(MRQAFeature(
-            unique_id=id_offset,
-            example_index=example_id,
-            doc_span_index=doc_span_index,
-            tokens=tokens,
-            token_to_orig_map=token_to_orig_map,
-            token_is_max_context=token_is_max_context,
-            input_ids=input_ids,
-            input_mask=input_mask,
-            segment_ids=segment_ids,
-            start_position=start_position,
-            end_position=end_position,
-            is_impossible=example.is_impossible))
-        id_offset += 1
-    return features
-def _features_to_batches(features, batch_size, in_tokens):
-    batch, total_token_num, max_len = [], 0, 0
-    for (index, feature) in enumerate(features):
-        if phase == 'train':
-            self.current_train_example = index + 1
-        seq_len = len(feature.input_ids)
-        labels = [feature.unique_id
-                  ] if feature.start_position is None else [
-                      feature.start_position, feature.end_position
-                  ]
-        example = [
-            feature.input_ids, feature.segment_ids, range(seq_len)
-        ] + labels
-        max_len = max(max_len, seq_len)
-        if in_tokens:
-            to_append = (len(batch) + 1) * max_len <= batch_size
-        else:
-            to_append = len(batch) < batch_size
-        if to_append:
-            batch.append(example)
-            total_token_num += seq_len
-        else:
-            yield batch, total_token_num
-            batch, total_token_num, max_len = [example
-                                               ], seq_len, seq_len
-    if len(batch) > 0:
-        yield batch, total_token_num
-def _estimate_runtime_examples(data, sample_rate, tokenizer, \
-                              max_seq_length, doc_stride, max_query_length, \
-                              remove_impossible_questions=True, filter_invalid_spans=True):
-    """Count runtime examples which may differ from number of raw samples due to sliding window operation and etc.. 
-       This is useful to get correct warmup steps for training."""
-    assert sample_rate > 0.0 and sample_rate <= 1.0, "sample_rate must be set between 0.0~1.0"
-    num_raw_examples = 0
-    for entry in data:
-        for paragraph in entry["paragraphs"]:
-            paragraph_text = paragraph["context"]
-            for qa in paragraph["qas"]:
-                num_raw_examples += 1
-    # print("num raw examples:{}".format(num_raw_examples))
-    def is_whitespace(c):
-        if c == " " or c == "\t" or c == "\r" or c == "\n" or ord(c) == 0x202F:
-            return True
-        return False
-    sampled_examples = []
-    first_samp = True
-    for entry in data:
-        for paragraph in entry["paragraphs"]:
-            doc_tokens = None
-            for qa in paragraph["qas"]:
-                if not first_samp and random.random() > sample_rate and sample_rate < 1.0:
-                    continue
-                if doc_tokens is None:
-                    paragraph_text = paragraph["context"]
-                    doc_tokens = []
-                    char_to_word_offset = []
-                    prev_is_whitespace = True
-                    for c in paragraph_text:
-                        if is_whitespace(c):
-                            prev_is_whitespace = True
-                        else:
-                            if prev_is_whitespace:
-                                doc_tokens.append(c)
-                            else:
-                                doc_tokens[-1] += c
-                            prev_is_whitespace = False
-                        char_to_word_offset.append(len(doc_tokens) - 1)
-                assert len(qa["answers"]) == 1, "For training, each question should have exactly 1 answer."
-                qas_id = qa["id"]
-                question_text = qa["question"]
-                start_position = None
-                end_position = None
-                orig_answer_text = None
-                is_impossible = False
-                if ('is_impossible' in qa) and (qa["is_impossible"]):
-                    if remove_impossible_questions or filter_invalid_spans:
-                        continue
-                    else:
-                        start_position = -1
-                        end_position = -1
-                        orig_answer_text = ""
-                        is_impossible = True
-                else:
-                    answer = qa["answers"][0]
-                    orig_answer_text = answer["text"]
-                    answer_offset = answer["answer_start"]
-                    answer_length = len(orig_answer_text)
-                    start_position = char_to_word_offset[answer_offset]
-                    end_position = char_to_word_offset[answer_offset +
-                                                       answer_length - 1]
-                    # remove corrupt samples
-                    actual_text = " ".join(doc_tokens[start_position:(
-                        end_position + 1)])
-                    cleaned_answer_text = " ".join(
-                        tokenization.whitespace_tokenize(orig_answer_text))
-                    if actual_text.find(cleaned_answer_text) == -1:
-                        continue
-                example = MRQAExample(
-                    qas_id=qas_id,
-                    question_text=question_text,
-                    doc_tokens=doc_tokens,
-                    orig_answer_text=orig_answer_text,
-                    start_position=start_position,
-                    end_position=end_position,
-                    is_impossible=is_impossible)
-                sampled_examples.append(example)
-                first_samp = False
-    runtime_sample_rate = len(sampled_examples) / float(num_raw_examples)
-    runtime_samp_cnt = 0
-    for example in sampled_examples:
-        query_tokens = tokenizer.tokenize(example.question_text)
-        if len(query_tokens) > max_query_length:
-            query_tokens = query_tokens[0:max_query_length]
-        tok_to_orig_index = []
-        orig_to_tok_index = []
-        all_doc_tokens = []
-        for (i, token) in enumerate(example.doc_tokens):
-            orig_to_tok_index.append(len(all_doc_tokens))
-            sub_tokens = tokenizer.tokenize(token)
-            for sub_token in sub_tokens:
-                tok_to_orig_index.append(i)
-                all_doc_tokens.append(sub_token)
-        tok_start_position = None
-        tok_end_position = None
-        tok_start_position = orig_to_tok_index[example.start_position]
-        if example.end_position < len(example.doc_tokens) - 1:
-            tok_end_position = orig_to_tok_index[example.end_position + 1] - 1
-        else:
-            tok_end_position = len(all_doc_tokens) - 1
-        (tok_start_position, tok_end_position) = _improve_answer_span(
-            all_doc_tokens, tok_start_position, tok_end_position, tokenizer,
-            example.orig_answer_text)
-        # The -3 accounts for [CLS], [SEP] and [SEP]
-        max_tokens_for_doc = max_seq_length - len(query_tokens) - 3
-        _DocSpan = collections.namedtuple(  # pylint: disable=invalid-name
-            "DocSpan", ["start", "length"])
-        doc_spans = []
-        start_offset = 0
-        while start_offset < len(all_doc_tokens):
-            length = len(all_doc_tokens) - start_offset
-            if length > max_tokens_for_doc:
-                length = max_tokens_for_doc
-            doc_spans.append(_DocSpan(start=start_offset, length=length))
-            if start_offset + length == len(all_doc_tokens):
-                break
-            start_offset += min(length, doc_stride)
-        for (doc_span_index, doc_span) in enumerate(doc_spans):
-            doc_start = doc_span.start
-            doc_end = doc_span.start + doc_span.length - 1
-            if filter_invalid_spans and not (tok_start_position >= doc_start and tok_end_position <= doc_end):
-                continue
-            runtime_samp_cnt += 1
-    return int(runtime_samp_cnt/runtime_sample_rate)
-def _improve_answer_span(doc_tokens, input_start, input_end, tokenizer,
-                         orig_answer_text):
-    """Returns tokenized answer spans that better match the annotated answer."""
-    # The MRQA annotations are character based. We first project them to
-    # whitespace-tokenized words. But then after WordPiece tokenization, we can
-    # often find a "better match". For example:
-    #
-    #   Question: What year was John Smith born?
-    #   Context: The leader was John Smith (1895-1943).
-    #   Answer: 1895
-    #
-    # The original whitespace-tokenized answer will be "(1895-1943).". However
-    # after tokenization, our tokens will be "( 1895 - 1943 ) .". So we can match
-    # the exact answer, 1895.
-    #
-    # However, this is not always possible. Consider the following:
-    #
-    #   Question: What country is the top exporter of electornics?
-    #   Context: The Japanese electronics industry is the lagest in the world.
-    #   Answer: Japan
-    #
-    # In this case, the annotator chose "Japan" as a character sub-span of
-    # the word "Japanese". Since our WordPiece tokenizer does not split
-    # "Japanese", we just use "Japanese" as the annotation. This is fairly rare
-    # in MRQA, but does happen.
-    tok_answer_text = " ".join(tokenizer.tokenize(orig_answer_text))
-    for new_start in range(input_start, input_end + 1):
-        for new_end in range(input_end, new_start - 1, -1):
-            text_span = " ".join(doc_tokens[new_start:(new_end + 1)])
-            if text_span == tok_answer_text:
-                return (new_start, new_end)
-    return (input_start, input_end)
-def _check_is_max_context(doc_spans, cur_span_index, position):
-    """Check if this is the 'max context' doc span for the token."""
-    # Because of the sliding window approach taken to scoring documents, a single
-    # token can appear in multiple documents. E.g.
-    #  Doc: the man went to the store and bought a gallon of milk
-    #  Span A: the man went to the
-    #  Span B: to the store and bought
-    #  Span C: and bought a gallon of
-    #  ...
-    #
-    # Now the word 'bought' will have two scores from spans B and C. We only
-    # want to consider the score with "maximum context", which we define as
-    # the *minimum* of its left and right context (the *sum* of left and
-    # right context will always be the same, of course).
-    #
-    # In the example the maximum context for 'bought' would be span C since
-    # it has 1 left context and 3 right context, while span B has 4 left context
-    # and 0 right context.
-    best_score = None
-    best_span_index = None
-    for (span_index, doc_span) in enumerate(doc_spans):
-        end = doc_span.start + doc_span.length - 1
-        if position < doc_span.start:
-            continue
-        if position > end:
-            continue
-        num_left_context = position - doc_span.start
-        num_right_context = end - position
-        score = min(num_left_context,
-                    num_right_context) + 0.01 * doc_span.length
-        if best_score is None or score > best_score:
-            best_score = score
-            best_span_index = span_index
-    return cur_span_index == best_span_index
--- a/paddlepalm/task_paradigm/cls.py
+++ b/paddlepalm/task_paradigm/cls.py
@@ -21,15 +21,29 @@ class TaskParadigm(task_paradigm):
    '''
    classification
    '''
-    def __init___(self, config, phase):
+    def __init___(self, config, phase, backbone_config=None):
        self._is_training = phase == 'train'
-        self.sent_emb_size = config['hidden_size']
+        self._hidden_size = backbone_config['hidden_size']
        self.num_classes = config['n_classes']
+        if 'initializer_range' in config:
+            self._param_initializer = config['initializer_range']
+        else:
+            self._param_initializer = fluid.initializer.TruncatedNormal(
+                scale=backbone_config.get('initializer_range', 0.02))
+        if 'dropout_prob' in config:
+            self._dropout_prob = config['dropout_prob']
+        else:
+            self._dropout_prob = backbone_config.get('hidden_dropout_prob', 0.0)
    @property
    def inputs_attrs(self):
-        return {'bakcbone': {"sentence_emb": [-1, self.sent_emb_size], 'float32']},
+        if self._is_training:
-                'reader': {"label_ids": [[-1, 1], 'int64']}}
+            reader = {"label_ids": [[-1, 1], 'int64']}
+        else:
+            reader = {}
+        bb = {"sentence_embedding": [[-1, self._hidden_size], 'float32']}
+        return {'reader': reader, 'backbone': bb}
    @property
    def outputs_attrs(self):
@@ -39,22 +53,29 @@ class TaskParadigm(task_paradigm):
            return {'logits': [-1, self.num_classes], 'float32'}
    def build(self, **inputs):
-        sent_emb = inputs['backbone']['sentence_emb']
+        sent_emb = inputs['backbone']['sentence_embedding']
        label_ids = inputs['reader']['label_ids']
+        if self._is_training:
+            cls_feats = fluid.layers.dropout(
+                x=sent_emb,
+                dropout_prob=self._dropout_prob,
+                dropout_implementation="upscale_in_train")
        logits = fluid.layers.fc(
-            input=ent_emb
+            input=sent_emb,
            size=self.num_classes,
            param_attr=fluid.ParamAttr(
                name="cls_out_w",
-                initializer=fluid.initializer.TruncatedNormal(scale=0.1)),
+                initializer=self._param_initializer),
            bias_attr=fluid.ParamAttr(
                name="cls_out_b", initializer=fluid.initializer.Constant(0.)))
-        loss = fluid.layers.softmax_with_cross_entropy(
-            logits=logits, label=label_ids)
-        loss = layers.mean(loss)
        if self._is_training:
+            loss = fluid.layers.softmax_with_cross_entropy(
+                logits=logits, label=label_ids)
+            loss = layers.mean(loss)
            return {"loss": loss}
        else:
            return {"logits":logits}
--- a/paddlepalm/task_paradigm/match.py
+++ b/paddlepalm/task_paradigm/match.py
@@ -24,6 +24,17 @@ class TaskParadigm(task_paradigm):
    def __init__(self, config, phase, backbone_config=None):
        self._is_training = phase == 'train'
        self._hidden_size = backbone_config['hidden_size']
+        if 'initializer_range' in config:
+            self._param_initializer = config['initializer_range']
+        else:
+            self._param_initializer = fluid.initializer.TruncatedNormal(
+                scale=backbone_config.get('initializer_range', 0.02))
+        if 'dropout_prob' in config:
+            self._dropout_prob = config['dropout_prob']
+        else:
+            self._dropout_prob = backbone_config.get('hidden_dropout_prob', 0.0)
    @property
    def inputs_attrs(self):
@@ -46,16 +57,18 @@ class TaskParadigm(task_paradigm):
            labels = inputs["reader"]["label_ids"] 
        cls_feats = inputs["backbone"]["sentence_pair_embedding"]
-        cls_feats = fluid.layers.dropout(
+        if self._is_training:
-            x=cls_feats,
+            cls_feats = fluid.layers.dropout(
-            dropout_prob=0.1,
+                x=cls_feats,
-            dropout_implementation="upscale_in_train")
+                dropout_prob=self._dropout_prob,
+                dropout_implementation="upscale_in_train")
        logits = fluid.layers.fc(
            input=cls_feats,
            size=2,
            param_attr=fluid.ParamAttr(
                name="cls_out_w",
-                initializer=fluid.initializer.TruncatedNormal(scale=0.02)),
+                initializer=self._param_initializer),
            bias_attr=fluid.ParamAttr(
                name="cls_out_b",
                initializer=fluid.initializer.Constant(0.)))

--- a/run_demo2.sh
+++ b/run_demo2.sh
-export CUDA_VISIBLE_DEVICES=0
+export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+while true
-python demo2.py
+do
+    python demo2.py
+done
--- a/run_demo3.sh
+++ b/run_demo3.sh
+export CUDA_VISIBLE_DEVICES=0,1
+python demo3.py