Stop-Press-2014年8月
从Pandoc 1.13开始,Pandoc现在包含了我对DokuWiki编写的实现-与该脚本相比,在那里实现了更多功能。因此,该脚本现在几乎是多余的。
最初,我不想编写Python脚本来进行转换,但最终还是那样做。
真正节省时间的步骤是使用Pandoc解析Markdown文本,并写出文档的JSON表示形式。这样,该JSON文件几乎就很容易解析,并以DokuWiki格式写出。
以下是脚本,该脚本实现了我关心的Markdown和DokuWiki的一些功能,以及其他一些功能。(我尚未上传我编写的相应测试套件)
使用要求:
- Python(我在Windows上使用2.7)
- 已安装Pandoc,并在PATH中放入pandoc.exe(或编辑脚本以放入Pandoc的完整路径)
我希望这也可以节省别人的时间...
编辑2:2013-06-26:我现在将此代码放入https://github.com/claremacrae/markdown_to_dokuwiki.py的 GitHub中。注意,那里的代码增加了对更多格式的支持,还包含一个测试套件。
编辑1:调整为以Markdown的反引号样式添加代码以解析代码示例:
# -*- coding: latin-1 -*-
import sys
import os
import json
__doc__ = """This script will read a text file in Markdown format,
and convert it to DokuWiki format.
The basic approach is to run pandoc to convert the markdown to JSON,
and then to parse the JSON output, and convert it to dokuwiki, which
is written to standard output
Requirements:
- pandoc is in the user's PATH
"""
# TODOs
# underlined, fixed-width
# Code quotes
list_depth = 0
list_depth_increment = 2
def process_list( list_marker, value ):
global list_depth
list_depth += list_depth_increment
result = ""
for item in value:
result += '\n' + list_depth * unicode( ' ' ) + list_marker + process_container( item )
list_depth -= list_depth_increment
if list_depth == 0:
result += '\n'
return result
def process_container( container ):
if isinstance( container, dict ):
assert( len(container) == 1 )
key = container.keys()[ 0 ]
value = container.values()[ 0 ]
if key == 'Para':
return process_container( value ) + '\n\n'
if key == 'Str':
return value
elif key == 'Header':
level = value[0]
marker = ( 7 - level ) * unicode( '=' )
return marker + unicode(' ') + process_container( value[1] ) + unicode(' ') + marker + unicode('\n\n')
elif key == 'Strong':
return unicode('**') + process_container( value ) + unicode('**')
elif key == 'Emph':
return unicode('//') + process_container( value ) + unicode('//')
elif key == 'Code':
return unicode("''") + value[1] + unicode("''")
elif key == "Link":
url = value[1][0]
return unicode('[[') + url + unicode('|') + process_container( value[0] ) + unicode(']]')
elif key == "BulletList":
return process_list( unicode( '* ' ), value)
elif key == "OrderedList":
return process_list( unicode( '- ' ), value[1])
elif key == "Plain":
return process_container( value )
elif key == "BlockQuote":
# There is no representation of blockquotes in DokuWiki - we'll just
# have to spit out the unmodified text
return '\n' + process_container( value ) + '\n'
#elif key == 'Code':
# return unicode("''") + process_container( value ) + unicode("''")
else:
return unicode("unknown map key: ") + key + unicode( " value: " ) + str( value )
if isinstance( container, list ):
result = unicode("")
for value in container:
result += process_container( value )
return result
if isinstance( container, unicode ):
if container == unicode( "Space" ):
return unicode( " " )
elif container == unicode( "HorizontalRule" ):
return unicode( "----\n\n" )
return unicode("unknown") + str( container )
def process_pandoc_jason( data ):
assert( len(data) == 2 )
result = unicode('')
for values in data[1]:
result += process_container( values )
print result
def convert_file( filename ):
# Use pandoc to parse the input file, and write it out as json
tempfile = "temp_script_output.json"
command = "pandoc --to=json \"%s\" --output=%s" % ( filename, tempfile )
#print command
os.system( command )
input_file = open(tempfile, 'r' )
input_text = input_file.readline()
input_file.close()
## Parse the data
data = json.loads( input_text )
process_pandoc_jason( data )
def main( files ):
for filename in files:
convert_file( filename )
if __name__ == "__main__":
files = sys.argv[1:]
if len( files ) == 0:
sys.stderr.write( "Supply one or more filenames to convert on the command line\n" )
return_code = 1
else:
main( files )
return_code = 0
sys.exit( return_code )