未验证 提交 39d9e6b4 编写于 作者: S Shengliang Guan 提交者: GitHub

Merge pull request #3963 from taosdata/feature/crash_gen

Refactoring and enhancing crash_gen tool
......@@ -54,6 +54,7 @@ export PYTHONPATH=$(pwd)/../../src/connector/python/linux/python3:$(pwd)
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LIB_DIR
# Now we are all let, and let's see if we can find a crash. Note we pass all params
CRASH_GEN_EXEC=crash_gen_bootstrap.py
if [[ $1 == '--valgrind' ]]; then
shift
export PYTHONMALLOC=malloc
......@@ -66,14 +67,14 @@ if [[ $1 == '--valgrind' ]]; then
--leak-check=yes \
--suppressions=crash_gen/valgrind_taos.supp \
$PYTHON_EXEC \
./crash_gen/crash_gen.py $@ > $VALGRIND_OUT 2> $VALGRIND_ERR
$CRASH_GEN_EXEC $@ > $VALGRIND_OUT 2> $VALGRIND_ERR
elif [[ $1 == '--helgrind' ]]; then
shift
valgrind \
--tool=helgrind \
$PYTHON_EXEC \
./crash_gen/crash_gen.py $@
$CRASH_GEN_EXEC $@
else
$PYTHON_EXEC ./crash_gen/crash_gen.py $@
$PYTHON_EXEC $CRASH_GEN_EXEC $@
fi
<center><h1>User's Guide to the Crash_Gen Tool</h1></center>
# Introduction
To effectively test and debug our TDengine product, we have developed a simple tool to
exercise various functions of the system in a randomized fashion, hoping to expose
maximum number of problems, hopefully without a pre-determined scenario.
# Preparation
To run this tool, please ensure the followed preparation work is done first.
1. Fetch a copy of the TDengine source code, and build it successfully in the `build/`
directory
1. Ensure that the system has Python3.8 or above properly installed. We use
Ubuntu 20.04LTS as our own development environment, and suggest you also use such
an environment if possible.
# Simple Execution
To run the tool with the simplest method, follow the steps below:
1. Open a terminal window, start the `taosd` service in the `build/` directory
(or however you prefer to start the `taosd` service)
1. Open another terminal window, go into the `tests/pytest/` directory, and
run `./crash_gen.sh -p -t 3 -s 10` (change the two parameters here as you wish)
1. Watch the output to the end and see if you get a `SUCCESS` or `FAILURE`
That's it!
# Running Clusters
This tool also makes it easy to test/verify the clustering capabilities of TDengine. You
can start a cluster quite easily with the following command:
```
$ cd tests/pytest/
$ ./crash_gen.sh -e -o 3
```
The `-e` option above tells the tool to start the service, and do not run any tests, while
the `-o 3` option tells the tool to start 3 DNodes and join them together in a cluster.
Obviously you can adjust the the number here.
## Behind the Scenes
When the tool runs a cluster, it users a number of directories, each holding the information
for a single DNode, see:
```
$ ls build/cluster*
build/cluster_dnode_0:
cfg data log
build/cluster_dnode_1:
cfg data log
build/cluster_dnode_2:
cfg data log
```
Therefore, when something goes wrong and you want to reset everything with the cluster, simple
erase all the files:
```
$ rm -rf build/cluster_dnode_*
```
## Addresses and Ports
The DNodes in the cluster all binds the the `127.0.0.1` IP address (for now anyway), and
uses port 6030 for the first DNode, and 6130 for the 2nd one, and so on.
## Testing Against a Cluster
In a separate terminal window, you can invoke the tool in client mode and test against
a cluster, such as:
```
$ ./crash_gen.sh -p -t 10 -s 100 -i 3
```
Here the `-i` option tells the tool to always create tables with 3 replicas, and run
all tests against such tables.
# Additional Features
The exhaustive features of the tool is available through the `-h` option:
```
$ ./crash_gen.sh -h
usage: crash_gen_bootstrap.py [-h] [-a] [-b MAX_DBS] [-c CONNECTOR_TYPE] [-d] [-e] [-g IGNORE_ERRORS] [-i MAX_REPLICAS] [-l] [-n] [-o NUM_DNODES] [-p] [-r]
[-s MAX_STEPS] [-t NUM_THREADS] [-v] [-x]
TDengine Auto Crash Generator (PLEASE NOTICE the Prerequisites Below)
---------------------------------------------------------------------
1. You build TDengine in the top level ./build directory, as described in offical docs
2. You run the server there before this script: ./build/bin/taosd -c test/cfg
optional arguments:
-h, --help show this help message and exit
-a, --auto-start-service
Automatically start/stop the TDengine service (default: false)
-b MAX_DBS, --max-dbs MAX_DBS
Maximum number of DBs to keep, set to disable dropping DB. (default: 0)
-c CONNECTOR_TYPE, --connector-type CONNECTOR_TYPE
Connector type to use: native, rest, or mixed (default: 10)
-d, --debug Turn on DEBUG mode for more logging (default: false)
-e, --run-tdengine Run TDengine service in foreground (default: false)
-g IGNORE_ERRORS, --ignore-errors IGNORE_ERRORS
Ignore error codes, comma separated, 0x supported (default: None)
-i MAX_REPLICAS, --max-replicas MAX_REPLICAS
Maximum number of replicas to use, when testing against clusters. (default: 1)
-l, --larger-data Write larger amount of data during write operations (default: false)
-n, --dynamic-db-table-names
Use non-fixed names for dbs/tables, useful for multi-instance executions (default: false)
-o NUM_DNODES, --num-dnodes NUM_DNODES
Number of Dnodes to initialize, used with -e option. (default: 1)
-p, --per-thread-db-connection
Use a single shared db connection (default: false)
-r, --record-ops Use a pair of always-fsynced fils to record operations performing + performed, for power-off tests (default: false)
-s MAX_STEPS, --max-steps MAX_STEPS
Maximum number of steps to run (default: 100)
-t NUM_THREADS, --num-threads NUM_THREADS
Number of threads to run (default: 10)
-v, --verify-data Verify data written in a number of places by reading back (default: false)
-x, --continue-on-exception
Continue execution after encountering unexpected/disallowed errors/exceptions (default: false)
```
此差异已折叠。
from __future__ import annotations
import sys
import time
import threading
import requests
from requests.auth import HTTPBasicAuth
import taos
from util.sql import *
from util.cases import *
from util.dnodes import *
from util.log import *
from .misc import Logging, CrashGenError, Helper, Dice
import os
import datetime
# from .service_manager import TdeInstance
class DbConn:
TYPE_NATIVE = "native-c"
TYPE_REST = "rest-api"
TYPE_INVALID = "invalid"
@classmethod
def create(cls, connType, dbTarget):
if connType == cls.TYPE_NATIVE:
return DbConnNative(dbTarget)
elif connType == cls.TYPE_REST:
return DbConnRest(dbTarget)
else:
raise RuntimeError(
"Unexpected connection type: {}".format(connType))
@classmethod
def createNative(cls, dbTarget) -> DbConn:
return cls.create(cls.TYPE_NATIVE, dbTarget)
@classmethod
def createRest(cls, dbTarget) -> DbConn:
return cls.create(cls.TYPE_REST, dbTarget)
def __init__(self, dbTarget):
self.isOpen = False
self._type = self.TYPE_INVALID
self._lastSql = None
self._dbTarget = dbTarget
def __repr__(self):
return "[DbConn: type={}, target={}]".format(self._type, self._dbTarget)
def getLastSql(self):
return self._lastSql
def open(self):
if (self.isOpen):
raise RuntimeError("Cannot re-open an existing DB connection")
# below implemented by child classes
self.openByType()
Logging.debug("[DB] data connection opened: {}".format(self))
self.isOpen = True
def close(self):
raise RuntimeError("Unexpected execution, should be overriden")
def queryScalar(self, sql) -> int:
return self._queryAny(sql)
def queryString(self, sql) -> str:
return self._queryAny(sql)
def _queryAny(self, sql): # actual query result as an int
if (not self.isOpen):
raise RuntimeError("Cannot query database until connection is open")
nRows = self.query(sql)
if nRows != 1:
raise taos.error.ProgrammingError(
"Unexpected result for query: {}, rows = {}".format(sql, nRows),
(0x991 if nRows==0 else 0x992)
)
if self.getResultRows() != 1 or self.getResultCols() != 1:
raise RuntimeError("Unexpected result set for query: {}".format(sql))
return self.getQueryResult()[0][0]
def use(self, dbName):
self.execute("use {}".format(dbName))
def existsDatabase(self, dbName: str):
''' Check if a certain database exists '''
self.query("show databases")
dbs = [v[0] for v in self.getQueryResult()] # ref: https://stackoverflow.com/questions/643823/python-list-transformation
# ret2 = dbName in dbs
# print("dbs = {}, str = {}, ret2={}, type2={}".format(dbs, dbName,ret2, type(dbName)))
return dbName in dbs # TODO: super weird type mangling seen, once here
def hasTables(self):
return self.query("show tables") > 0
def execute(self, sql):
''' Return the number of rows affected'''
raise RuntimeError("Unexpected execution, should be overriden")
def safeExecute(self, sql):
'''Safely execute any SQL query, returning True/False upon success/failure'''
try:
self.execute(sql)
return True # ignore num of results, return success
except taos.error.ProgrammingError as err:
return False # failed, for whatever TAOS reason
# Not possile to reach here, non-TAOS exception would have been thrown
def query(self, sql) -> int: # return num rows returned
''' Return the number of rows affected'''
raise RuntimeError("Unexpected execution, should be overriden")
def openByType(self):
raise RuntimeError("Unexpected execution, should be overriden")
def getQueryResult(self):
raise RuntimeError("Unexpected execution, should be overriden")
def getResultRows(self):
raise RuntimeError("Unexpected execution, should be overriden")
def getResultCols(self):
raise RuntimeError("Unexpected execution, should be overriden")
# Sample: curl -u root:taosdata -d "show databases" localhost:6020/rest/sql
class DbConnRest(DbConn):
REST_PORT_INCREMENT = 11
def __init__(self, dbTarget: DbTarget):
super().__init__(dbTarget)
self._type = self.TYPE_REST
restPort = dbTarget.port + 11
self._url = "http://{}:{}/rest/sql".format(
dbTarget.hostAddr, dbTarget.port + self.REST_PORT_INCREMENT)
self._result = None
def openByType(self): # Open connection
pass # do nothing, always open
def close(self):
if (not self.isOpen):
raise RuntimeError("Cannot clean up database until connection is open")
# Do nothing for REST
Logging.debug("[DB] REST Database connection closed")
self.isOpen = False
def _doSql(self, sql):
self._lastSql = sql # remember this, last SQL attempted
try:
r = requests.post(self._url,
data = sql,
auth = HTTPBasicAuth('root', 'taosdata'))
except:
print("REST API Failure (TODO: more info here)")
raise
rj = r.json()
# Sanity check for the "Json Result"
if ('status' not in rj):
raise RuntimeError("No status in REST response")
if rj['status'] == 'error': # clearly reported error
if ('code' not in rj): # error without code
raise RuntimeError("REST error return without code")
errno = rj['code'] # May need to massage this in the future
# print("Raising programming error with REST return: {}".format(rj))
raise taos.error.ProgrammingError(
rj['desc'], errno) # todo: check existance of 'desc'
if rj['status'] != 'succ': # better be this
raise RuntimeError(
"Unexpected REST return status: {}".format(
rj['status']))
nRows = rj['rows'] if ('rows' in rj) else 0
self._result = rj
return nRows
def execute(self, sql):
if (not self.isOpen):
raise RuntimeError(
"Cannot execute database commands until connection is open")
Logging.debug("[SQL-REST] Executing SQL: {}".format(sql))
nRows = self._doSql(sql)
Logging.debug(
"[SQL-REST] Execution Result, nRows = {}, SQL = {}".format(nRows, sql))
return nRows
def query(self, sql): # return rows affected
return self.execute(sql)
def getQueryResult(self):
return self._result['data']
def getResultRows(self):
print(self._result)
raise RuntimeError("TBD") # TODO: finish here to support -v under -c rest
# return self._tdSql.queryRows
def getResultCols(self):
print(self._result)
raise RuntimeError("TBD")
# Duplicate code from TDMySQL, TODO: merge all this into DbConnNative
class MyTDSql:
# Class variables
_clsLock = threading.Lock() # class wide locking
longestQuery = None # type: str
longestQueryTime = 0.0 # seconds
lqStartTime = 0.0
# lqEndTime = 0.0 # Not needed, as we have the two above already
def __init__(self, hostAddr, cfgPath):
# Make the DB connection
self._conn = taos.connect(host=hostAddr, config=cfgPath)
self._cursor = self._conn.cursor()
self.queryRows = 0
self.queryCols = 0
self.affectedRows = 0
# def init(self, cursor, log=True):
# self.cursor = cursor
# if (log):
# caller = inspect.getframeinfo(inspect.stack()[1][0])
# self.cursor.log(caller.filename + ".sql")
def close(self):
self._cursor.close() # can we double close?
self._conn.close() # TODO: very important, cursor close does NOT close DB connection!
self._cursor.close()
def _execInternal(self, sql):
startTime = time.time()
ret = self._cursor.execute(sql)
# print("\nSQL success: {}".format(sql))
queryTime = time.time() - startTime
# Record the query time
cls = self.__class__
if queryTime > (cls.longestQueryTime + 0.01) :
with cls._clsLock:
cls.longestQuery = sql
cls.longestQueryTime = queryTime
cls.lqStartTime = startTime
return ret
def query(self, sql):
self.sql = sql
try:
self._execInternal(sql)
self.queryResult = self._cursor.fetchall()
self.queryRows = len(self.queryResult)
self.queryCols = len(self._cursor.description)
except Exception as e:
# caller = inspect.getframeinfo(inspect.stack()[1][0])
# args = (caller.filename, caller.lineno, sql, repr(e))
# tdLog.exit("%s(%d) failed: sql:%s, %s" % args)
raise
return self.queryRows
def execute(self, sql):
self.sql = sql
try:
self.affectedRows = self._execInternal(sql)
except Exception as e:
# caller = inspect.getframeinfo(inspect.stack()[1][0])
# args = (caller.filename, caller.lineno, sql, repr(e))
# tdLog.exit("%s(%d) failed: sql:%s, %s" % args)
raise
return self.affectedRows
class DbTarget:
def __init__(self, cfgPath, hostAddr, port):
self.cfgPath = cfgPath
self.hostAddr = hostAddr
self.port = port
def __repr__(self):
return "[DbTarget: cfgPath={}, host={}:{}]".format(
Helper.getFriendlyPath(self.cfgPath), self.hostAddr, self.port)
def getEp(self):
return "{}:{}".format(self.hostAddr, self.port)
class DbConnNative(DbConn):
# Class variables
_lock = threading.Lock()
# _connInfoDisplayed = False # TODO: find another way to display this
totalConnections = 0 # Not private
def __init__(self, dbTarget):
super().__init__(dbTarget)
self._type = self.TYPE_NATIVE
self._conn = None
# self._cursor = None
def openByType(self): # Open connection
# global gContainer
# tInst = tInst or gContainer.defTdeInstance # set up in ClientManager, type: TdeInstance
# cfgPath = self.getBuildPath() + "/test/cfg"
# cfgPath = tInst.getCfgDir()
# hostAddr = tInst.getHostAddr()
cls = self.__class__ # Get the class, to access class variables
with cls._lock: # force single threading for opening DB connections. # TODO: whaaat??!!!
dbTarget = self._dbTarget
# if not cls._connInfoDisplayed:
# cls._connInfoDisplayed = True # updating CLASS variable
Logging.debug("Initiating TAOS native connection to {}".format(dbTarget))
# Make the connection
# self._conn = taos.connect(host=hostAddr, config=cfgPath) # TODO: make configurable
# self._cursor = self._conn.cursor()
# Record the count in the class
self._tdSql = MyTDSql(dbTarget.hostAddr, dbTarget.cfgPath) # making DB connection
cls.totalConnections += 1
self._tdSql.execute('reset query cache')
# self._cursor.execute('use db') # do this at the beginning of every
# Open connection
# self._tdSql = MyTDSql()
# self._tdSql.init(self._cursor)
def close(self):
if (not self.isOpen):
raise RuntimeError("Cannot clean up database until connection is open")
self._tdSql.close()
# Decrement the class wide counter
cls = self.__class__ # Get the class, to access class variables
with cls._lock:
cls.totalConnections -= 1
Logging.debug("[DB] Database connection closed")
self.isOpen = False
def execute(self, sql):
if (not self.isOpen):
raise RuntimeError("Cannot execute database commands until connection is open")
Logging.debug("[SQL] Executing SQL: {}".format(sql))
self._lastSql = sql
nRows = self._tdSql.execute(sql)
Logging.debug(
"[SQL] Execution Result, nRows = {}, SQL = {}".format(
nRows, sql))
return nRows
def query(self, sql): # return rows affected
if (not self.isOpen):
raise RuntimeError(
"Cannot query database until connection is open")
Logging.debug("[SQL] Executing SQL: {}".format(sql))
self._lastSql = sql
nRows = self._tdSql.query(sql)
Logging.debug(
"[SQL] Query Result, nRows = {}, SQL = {}".format(
nRows, sql))
return nRows
# results are in: return self._tdSql.queryResult
def getQueryResult(self):
return self._tdSql.queryResult
def getResultRows(self):
return self._tdSql.queryRows
def getResultCols(self):
return self._tdSql.queryCols
class DbManager():
''' This is a wrapper around DbConn(), to make it easier to use.
TODO: rename this to DbConnManager
'''
def __init__(self, cType, dbTarget):
# self.tableNumQueue = LinearQueue() # TODO: delete?
# self.openDbServerConnection()
self._dbConn = DbConn.createNative(dbTarget) if (
cType == 'native') else DbConn.createRest(dbTarget)
try:
self._dbConn.open() # may throw taos.error.ProgrammingError: disconnected
except taos.error.ProgrammingError as err:
# print("Error type: {}, msg: {}, value: {}".format(type(err), err.msg, err))
if (err.msg == 'client disconnected'): # cannot open DB connection
print(
"Cannot establish DB connection, please re-run script without parameter, and follow the instructions.")
sys.exit(2)
else:
print("Failed to connect to DB, errno = {}, msg: {}"
.format(Helper.convertErrno(err.errno), err.msg))
raise
except BaseException:
print("[=] Unexpected exception")
raise
# Do this after dbConn is in proper shape
# Moved to Database()
# self._stateMachine = StateMechine(self._dbConn)
def getDbConn(self):
return self._dbConn
# TODO: not used any more, to delete
def pickAndAllocateTable(self): # pick any table, and "use" it
return self.tableNumQueue.pickAndAllocate()
# TODO: Not used any more, to delete
def addTable(self):
with self._lock:
tIndex = self.tableNumQueue.push()
return tIndex
# Not used any more, to delete
def releaseTable(self, i): # return the table back, so others can use it
self.tableNumQueue.release(i)
# TODO: not used any more, delete
def getTableNameToDelete(self):
tblNum = self.tableNumQueue.pop() # TODO: race condition!
if (not tblNum): # maybe false
return False
return "table_{}".format(tblNum)
def cleanUp(self):
self._dbConn.close()
import threading
import random
import logging
import os
class CrashGenError(Exception):
def __init__(self, msg=None, errno=None):
self.msg = msg
self.errno = errno
def __str__(self):
return self.msg
class LoggingFilter(logging.Filter):
def filter(self, record: logging.LogRecord):
if (record.levelno >= logging.INFO):
return True # info or above always log
# Commenting out below to adjust...
# if msg.startswith("[TRD]"):
# return False
return True
class MyLoggingAdapter(logging.LoggerAdapter):
def process(self, msg, kwargs):
return "[{}] {}".format(threading.get_ident() % 10000, msg), kwargs
# return '[%s] %s' % (self.extra['connid'], msg), kwargs
class Logging:
logger = None
@classmethod
def getLogger(cls):
return logger
@classmethod
def clsInit(cls, gConfig): # TODO: refactor away gConfig
if cls.logger:
return
# Logging Stuff
# global misc.logger
_logger = logging.getLogger('CrashGen') # real logger
_logger.addFilter(LoggingFilter())
ch = logging.StreamHandler()
_logger.addHandler(ch)
# Logging adapter, to be used as a logger
print("setting logger variable")
# global logger
cls.logger = MyLoggingAdapter(_logger, [])
if (gConfig.debug):
cls.logger.setLevel(logging.DEBUG) # default seems to be INFO
else:
cls.logger.setLevel(logging.INFO)
@classmethod
def info(cls, msg):
cls.logger.info(msg)
@classmethod
def debug(cls, msg):
cls.logger.debug(msg)
@classmethod
def warning(cls, msg):
cls.logger.warning(msg)
@classmethod
def error(cls, msg):
cls.logger.error(msg)
class Status:
STATUS_STARTING = 1
STATUS_RUNNING = 2
STATUS_STOPPING = 3
STATUS_STOPPED = 4
def __init__(self, status):
self.set(status)
def __repr__(self):
return "[Status: v={}]".format(self._status)
def set(self, status):
self._status = status
def get(self):
return self._status
def isStarting(self):
return self._status == Status.STATUS_STARTING
def isRunning(self):
# return self._thread and self._thread.is_alive()
return self._status == Status.STATUS_RUNNING
def isStopping(self):
return self._status == Status.STATUS_STOPPING
def isStopped(self):
return self._status == Status.STATUS_STOPPED
def isStable(self):
return self.isRunning() or self.isStopped()
# Deterministic random number generator
class Dice():
seeded = False # static, uninitialized
@classmethod
def seed(cls, s): # static
if (cls.seeded):
raise RuntimeError(
"Cannot seed the random generator more than once")
cls.verifyRNG()
random.seed(s)
cls.seeded = True # TODO: protect against multi-threading
@classmethod
def verifyRNG(cls): # Verify that the RNG is determinstic
random.seed(0)
x1 = random.randrange(0, 1000)
x2 = random.randrange(0, 1000)
x3 = random.randrange(0, 1000)
if (x1 != 864 or x2 != 394 or x3 != 776):
raise RuntimeError("System RNG is not deterministic")
@classmethod
def throw(cls, stop): # get 0 to stop-1
return cls.throwRange(0, stop)
@classmethod
def throwRange(cls, start, stop): # up to stop-1
if (not cls.seeded):
raise RuntimeError("Cannot throw dice before seeding it")
return random.randrange(start, stop)
@classmethod
def choice(cls, cList):
return random.choice(cList)
class Helper:
@classmethod
def convertErrno(cls, errno):
return errno if (errno > 0) else 0x80000000 + errno
@classmethod
def getFriendlyPath(cls, path): # returns .../xxx/yyy
ht1 = os.path.split(path)
ht2 = os.path.split(ht1[0])
return ".../" + ht2[1] + '/' + ht1[1]
class Progress:
STEP_BOUNDARY = 0
BEGIN_THREAD_STEP = 1
END_THREAD_STEP = 2
SERVICE_HEART_BEAT= 3
tokens = {
STEP_BOUNDARY: '.',
BEGIN_THREAD_STEP: '[',
END_THREAD_STEP: '] ',
SERVICE_HEART_BEAT: '.Y.'
}
@classmethod
def emit(cls, token):
print(cls.tokens[token], end="", flush=True)
此差异已折叠。
# -----!/usr/bin/python3.7
###################################################################
# Copyright (c) 2016 by TAOS Technologies, Inc.
# All rights reserved.
#
# This file is proprietary and confidential to TAOS Technologies.
# No part of this file may be reproduced, stored, transmitted,
# disclosed or used in any form or by any means other than as
# expressly provided by the written permission from Jianhui Tao
#
###################################################################
import sys
from crash_gen.crash_gen import MainExec
if __name__ == "__main__":
mExec = MainExec()
mExec.init()
exitCode = mExec.run()
print("Exiting with code: {}".format(exitCode))
sys.exit(exitCode)
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册