让我以一个真实的例子来解决这个问题
我需要根据自己的ohlc数据计算加权移动平均值,我大约有134000根蜡烛,每个蜡烛都有一个符号
- 选项1在Python / Node等中执行
- 选项2用SQL本身做!
哪一个更好?
- 本质上,如果必须在Python中执行此操作,则必须以最坏的情况获取所有存储的记录,执行计算并将所有内容保存回去,这在我看来是IO的巨大浪费
- 每当您得到新的蜡烛时,加权移动平均线都将发生变化,这意味着我将定期执行大量IO,这对我来说并不是一个好主意
- 在SQL中,我要做的可能只是编写一个触发器来计算和存储所有内容,因此只需要不时为每对获取最终的WMA值,这样效率就高得多
要求
- 如果必须为每个蜡烛计算WMA并将其存储,则可以在Python上进行
- 但是由于我只需要最后一个值,因此SQL比Python快得多
为了给您一些鼓励,这是Python版本中的加权移动平均值
WMA通过代码完成
import psycopg2
import psycopg2.extras
from talib import func
import timeit
import numpy as np
with psycopg2.connect('dbname=xyz user=xyz') as conn:
with conn.cursor() as cur:
t0 = timeit.default_timer()
cur.execute('select distinct symbol from ohlc_900 order by symbol')
for symbol in cur.fetchall():
cur.execute('select c from ohlc_900 where symbol = %s order by ts', symbol)
ohlc = np.array(cur.fetchall(), dtype = ([('c', 'f8')]))
wma = func.WMA(ohlc['c'], 10)
# print(*symbol, wma[-1])
print(timeit.default_timer() - t0)
conn.close()
WMA通过SQL
"""
if the period is 10
then we need 9 previous candles or 15 x 9 = 135 mins on the interval department
we also need to start counting at row number - (count in that group - 10)
For example if AAPL had 134 coins and current row number was 125
weight at that row will be weight = 125 - (134 - 10) = 1
10 period WMA calculations
Row no Weight c
125 1
126 2
127 3
128 4
129 5
130 6
131 7
132 8
133 9
134 10
"""
query2 = """
WITH
condition(sym, maxts, cnt) as (
select symbol, max(ts), count(symbol) from ohlc_900 group by symbol
),
cte as (
select symbol, ts,
case when cnt >= 10 and ts >= maxts - interval '135 mins'
then (row_number() over (partition by symbol order by ts) - (cnt - 10)) * c
else null
end as weighted_close
from ohlc_900
INNER JOIN condition
ON symbol = sym
WINDOW
w as (partition by symbol order by ts rows between 9 preceding and current row)
)
select symbol, sum(weighted_close)/55 as wma
from cte
WHERE weighted_close is NOT NULL
GROUP by symbol ORDER BY symbol
"""
with psycopg2.connect('dbname=xyz user=xyz') as conn:
with conn.cursor() as cur:
t0 = timeit.default_timer()
cur.execute(query2)
# for i in cur.fetchall():
# print(*i)
print(timeit.default_timer() - t0)
conn.close()
信不信由你,查询的运行速度比纯Python版本的加权平均运行速度还要快!我一步一步地编写了该查询,所以挂在那里,你会做的很好
速度
0.42141127300055814秒Python
0.23801879299935536秒SQL
我的数据库中有134000个伪造的OHLC记录,分为1000只股票,所以这是SQL可以胜过您的应用服务器的一个示例